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Query: 



Sbjct: 



1 



1 



MYLIEPIRNGKRITDGftlALAMQVYIIjQNVFLDDDILFPYYCDPKVEIGKFQNAVIETNQ 
MYriIEPIRISIGKRITDGA+ALftMQVY+ +N+FLDDDILFPYYCDPKVEIGKFQNaV+ETNQ 
MYLIEPIRNGKRITDC3RVMiaMQVYVQEffnJFLDDDILFPyyCDPKVEIGKFQNA.VVETNQ 



60 



60 



Query : 



61 



EYLKEHDIPVVRRDTGGGRVYVDSGa™iCYLMKDHGQFGDFKRAYEPAIKRLKTLGRSS 



120 



EYLKEH IPVVRRDTGGGAVYVDSGa.VNICYL+ D+G FGDFKR Y+PAI+AL LGA+ 
Sbjct: 61 EYLKEHHIPWRRDTGGGAVYVDSGAVNICYIiINDNGIFGDFKRTYQPAIEALHHLGATE 120 

Query: 121 VEMREim)LVIK3KKVSGAaMTIVNGRIYGGYSLLLDVDFDAMEKVIOTNRKra 180 

VEM RNDLVIDGKKVSGAAMTI NGR+YGGYSLLLDVDF+AMEK L ENRKKIESKiGI+ 
Sbjct: 121 VEMSGraroLVIDGKOTSGAftOTIANGRVYGGYSLLLDVDFEaiffiKALKPim^^ 180 

Query: 181 SVRSRVGDIRSHliSEDYRHITTDQFKDLMVCQLLHIDHIDQAKRYHLTEKDWAAIDABAD 240 

SVRSRVG+IR HL+ Y+ IT ++FKDLMVCQLL 1+ I QAKRY LTEKDW IDAL + 
Sbjct: 181 SVRSRVGNIREHLAPQYQGITIEEPKDLMVCQLLQIETISQAKRYDLTEKDWQQIDALTE 240 

Query: 241 EK^KimiWNYGNSPQYSYHRDARFPSGTYDFHLEIEKGIITNCRIYGDFFSSKDISDIEN 300 

KY NW+WNYGN+PQY YHRD RF GT D HL+I+KG I CRIYGDFF DI+++E 
Sbjct: 241 RKYHNWEWNYG^IAPQYRYHRIX3RFTGGTVDIHLDIKKGYIAACRIYGDFFGKADIAELEG 300 

Query: 301 LLIGCPMKEELVLEKLSTLSLEDYFGQTSPEEIKAVLFS 339 

LIG M++E VL L+ + L Y G + EE+ ++FS 
Sbjct: 301 HLIGTRMEKEDVLATLNAIDIjAPYLGAITAEELGDLIFS 339 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2004 

A DNA sequence (GBSx2114) was identified in S.agalactiae <SEQ ID 6197> which encodes the amino 
acid sequence <SEQ ID 6198>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.49 Transmembrane 196 - 212 ( 196 - 212) 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB49329 GB:U39612 f ormyl-tetrahydrof olate synthetase 
[Streptococcus mutans] 
Identities - 432/556 (77%), Positives = 493/556 (87%) 

Query: 1 MKTDIEIAQSVALKPIAEIVEQVGIGFDDIELYGKYKAKLSFDKIEAVKSQKVGKLILVT 60 

MKTDIEIAQSV L+PI +V+++GI FDD+ELYGKYKAKIi+FDK;I+AV+ GKL+LVT 
Sbjct: 1 MKS-DIEIAQSVDLRPITNVVKKIfilDFDDLELYGKYKftKLTFDKIKAVEENAPGKLV^ 60 

Query: 61 AINPTPAGEGKSTMSIGLADALNKIGKKTMIALREPSLGFVMGIKGGAAGGGYAQVLPME 120 

AlNPTPAGEGKST++IGLaDMjNKIGKKTMIA+REPSIiGPVM^ 
Sbjct: 61 AINPTPAGEGKSTITIGLADALNKIGKKTMIAIREPSLGPVMGIKGGAAGGGYAQVLPME 120 

Query: 121 DINLHFTGDmAITTANNALSALLDNHIHQGNEUDIIXJRRVIWKRVVDLNDRAIiRQVIVG 180 

DINLHFTGDMHAITTAKNALSAL+DNH+HQGNEL IDQRR+IWKRWDLNDRALR V VG 
Sbjct: 121 DINLHFTGDMHAITTAMNALSALIDNHLHQGNELGIDQRRIIWKRWDLNDRALRHVTVG 180 

Query: 181 LGSPVNGIPREIXSFDITVaSEIMAILCIATDLSDLKKRLSNIVVAYSRMRKPIYVKDLKI 240 

IjGSP+NGIPREDGFDITVASEIMAILCIiAT++ DLK+RL+NIV+ Y +R P+YV+DL++ 
Sbjct: 181 LGSPINGIPREIXSFDITVASEimiLCLATNVEDLKERLANIVIGYRFDRSPVYVRDLEV 240 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0 . 1595 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty= 0.0000 (Not Clear) < suco 



Query: 241 EGALTLILKDTIKPNLVQTIYGTPALVHGGPFANIAHGCNSVLATSTALRLADYWTEAG 300 
+GAL LILK+ IKPNLVQTIYGTPA VHGGPFANIAHGGNSVLATSTALRLADY +TEAG 
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Sbjct: 241 QGALALILKEAIKPNLVQTIYGTPAFVHGGPFANIAHGCNSVLATSTALRLADYTITEAG 300 

Query: 301 FGADLGAEKFLDIKTPNLPTSPDAIVIVaTLRALKMHGGVSKEDLSQENVEAVKRGFTNL 360 

FGADLGAEKFLDIK PNLPTSPDA+VIVAT+RALKM+GGV+K+ L+QENVEAVK GF NL 
Sbjct: 301 FGADLGAEKi'LDIKAPNLPTSPDAWIVATIRftLKMNGGVAKnALNQENV^^ 360 

Query: 361 ERHVMNMRQYGVPWVAINQFTADTESEIATLKTLCSNIDVAVELASVliffiD 420 

RHV NMR+YGVPWVAIN+F DT EIA L+ LC+ IDV VELASVW +GADGG++LA 
Sbjct: 361 ARHVENMRKYGVPVWAIlffiFITDTNDEIAVLiaJLCaAIDVPVEIJiSVWANGADGGTO 420 

Query: 421 QTVANVIETQSSNYKRLYNDEDTIEEKIKKIVTKIYGGNKVHFGPKAQIQLKEFSDNGWD 480 

T+ N IE S+YKRLY++ ++EEK+ +1 +IY +KV F KA+ Q+ + NGWD 
Sbjct: 421 ]m.INTIENOTSHYKRriYD]mLSVEEKVTEIAKEIYRADKVIFEKK^^ 480 

15 Query: 481 KMPIOflAKTQYSFSnNENLLGAPTDFDITVREFVPKTGaGFrVTiLTO^ 540 

+PICMAKTQYSFSD4-P LLGAPT FDIT+RE VPK GAGFIVALTGDV+TMPGLPKKPA 
Sbjct: 481 NLPICMAKTQYSFSDDPKLLGAPTGFDITIRELVPKLGAGFIVALTGDVMTMPGLPKKPA 540 

Query: 541 ALNMDVLEDGTAIGLF 556 
20 ALDMDV DGTA+GLF 

Sbjct: 541 AI^NMDVAADGTALGLF 556 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6199> which encodes the amino acid 
sequence <SEQ ID 6200>. Analysis of this protein sequence reveals the following: 

25 Possible site: 50 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.49 Transmembrane 196 - 212 ( 196 - 212) 

Final Results 

30 bacterial membrane Certainty=0 . 1595 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm CertaintyisO . 0000 (Not Clear) < suco 

. The protein has homology with the following sequences in the databases: 

35 ■ >GP:AAB49329 GB:U39612 formyl-tetrahydrofolate synthetase 

[Streptococcus mutans] 
Identities = 432/556 (77%) , Positives = 490/556 (87%) 

Query: 1 MKBDIEIAQSVALQPITDIVKKVGIDGDDIELYGKYKAKLSFEKMKAVEANEPGKLILVT 60 
40 MK+DIEIAQSV L+PIT++VKK+GID DD+ELYGKYKAKL+P+K+KAVE N PGKL+LVT 

Sbjct: 1 MKTDIEIAQSVDLRPITNVVKKLGIDFDDLELYGKYKAKLTFDKIKAVEENAPGKLVLVT 60 

Query: 61 AINPTPAGEGKSTMSIGLADAMQMGKKTMLALREPSLGPVMGIKGGAAGGGYAQVLPME 120 
AINPTPAGEGKST++IGLADALN++GKKTM+A+REPSLGPVMGIKGGAAGGGYAQVLPME 
45 Sbjct: 61 AINPTPAGEGKSTITIGLADAIJIKIGKKTMIAIREPSIfiPVMGIKiGGAAGGGYAQVLPME 120 

Query: 121 DINLHPTODMHAITTANKALSALIDNHLQQGITOLGIDPRRIIWKRVLDrjQDRALRQVIVG 180 

DINLHFTGDMHAITTANNALSALIDNHL QGN+LGID RRIIWKRV+DLNDRALR V VG 
Sbjct: 121 DINLHFTGDMHAITTAMNALSALIDNHLHQGNELGIDQRRIIWKRVVDIOTRALRHVTVG 180 

50 

Query: 181 LGSPVNGVPREDGFDITVASEIMAILCLATDLKDLKKRLADIWAYTYDRKPVYVRDLKV 240 

LGSP+NG+PREDGFDITVASEIMAILCLAT+ ++DLK+RLA+IV+ Y +DR PVYVRDL+V 
Sbjct: 181 LGSPINGIPREDGFDITVASEIMAILCLATNVEDLKERLANIVIGYRFDRSPVYVEDLEV 240 

55 Query: 241 EGALTLILKDAIKPNLVQTIYGTPALIHGGPFANIAHGCNSVLATSTALRLADYTVTEAG 300 

+GAL LILK+AIKPNLVQTIYGTPA +HGGPFANIAHGCNSVLATSTALRLADYT+TEAG 
Sbjct: 241 QGALALILKEAIKPNLVQTIYGTPAFVHGGPFANIAHGCNSVLATSTALRLADYTITEAG 300 

Query: 301 FGADLGAEKFLNIKVPNLPKAPDAIVIVATLRALKMHGGVAKSDLAAENCEAVRLGFANL 360 
60 FGADLGAEKFL+IK PNLP +PDA+VIVAT+RALKM+GGVAK L EN EAV+ GFANL 

Sbjct: 301 FG3U3LGAEKFLDIKAPNLPTSPDAVVIVATIRALKMNGGVAKnRIJSrQENVEAVK^ 360 

Query: 361 KEHVENtTOQFKVPVVVAINEFVBDTEAEIATLKALCEEIKTOVErj^VW^ 420 
RHVENMR+H- VPWVAINEF+ DT EIA L+ LC I VPVELASVWANGA+GG+ LA. 
65 Sbjct: 361 ARHVENMRKYGVFVV\aiNEPITDTSroEIAVLRNLCAAIDVPVELASVl»M 420 
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Query: 421 KTVVRVIDQEAADYKRLYSDEDTLEEKV-INIVTQIYGGKAVQFGPKAKTQLKQFAEFGWD 480 

T++ 1+ + YKRLY + ++EEKV I +IY V F KAKTQ+ Q + GWD 
Sbjct: 421 OTLIOTIENNPSHYKRLYDNia.SVEEKVIEIAKEIYRiU3KVIFEKKAOT^ 480 

Query: 481 KLPVCMMCTQYSFSDNPSLICAPTDFDITIREFVPKTGAGFIVGLTGDVMTMPGLPKVPA 540 

LP+CMaKTQYSFSD+P LLGAPT FDITIRE VPK GAGFIV LTGDVMTMPGLPK PA 
Sbjct: 481 NLPICMAKTQYSFSDDPKLLCSAPTGFDITIRELVPKLGaGPIVaLTGDVMTMPGLPKKPA 540 

Query: 541 AMAMDVAENGTALGLF 556 

A+ MDVA +GTALGLF 
Sbjct: 541 ALNMDVAADGTALGLF 556 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 452/556 (81%) , Positives = 513/556 (91%) 

Query: 1 MKTDIEIAQSVALKPIAEIVEQVGIGFDDIELYGKYKAKLSFDKIEAVKSQKVGKLILVT 60 

MK+DIEIAQSVAL+PI +IV++VGI DDIELyGKYKAKLSF+K++AV++ + GKLILVT 
Sbjct: 1 MKSraElAQSVRLQPITDIVKKVGIDGDDIELYGKYKaiOjSFEKMKAVEfiNEPGKLILTO 60 

Query: 61 AINPTPAGEGKST^ISIGUiDAIi^^aGKKTMIAIlREPSLGPVMGIKGGaAGGGYAQVLPME 120 

AINPTPAGEGKSTMSIGIADALN++GKKTM+ALREPSLGPVMGIKGGAAGGGYAQVLPME 
Sbjct: 61 AINPTPAGEGKSTMSIGIADAMQMGraCTMLALREPSLGPVMGIKGGAAGGGYAQVLPME 120 

Query: 121 DIIKjHFTGDMHAITTAiqilALSALLDOTIHQGNELDIDQRRVIWKRVVDIM^ 180 

DINIiHFTGDMHAITTAlilNALSaL+niSlH+ QGN+L ID RR+IWKRV+DIiNDRAIiRQVIVG 
Sbjct: 121 DINLHFTGbMHAITTANNRLSALIDNHLQQGiroM 180 

Query: 181 LGSPVM3IPRBDGFDITOASEIMAILCLATDLSDLKKRLSNIVVAYSRMRKPIYVKDLKI 240 

LGSPVNG+PREDGFDITVASEIMAILCLATDL DLKKRL++IVVaY+ +RKP+YV+DLK+ 
Sbjct: 181 LGSPVNGVPREDGFDITVASEIMAILCIATDLKDLKKRIMIVVAYTTORKPV^^ 240 

■ Query: 241 EGALTLILKDTIKPNLVQTIYGTPALVHGGPFANIAHGCNSVLATSTALRLADYWTEAG 300 
EGALTLILKD IKPNLVQTIYGTPAL+HGGPFANIAHGCMSVXiATSTALRLaDY VTEAG 
Sbjct: 241 EGRLTLILKDAIKPNLVQTIYGTPALIHGGPFANIAHGCNSVIATSTALRIiRDyrVTEAG 300 

Query: 301 PGaDIX3MKI'IlDIK^P^mPTSPDAIVIVATLRALKMHGGVSKEDLSQENVEAVKR6FT^ 360 

FGADLGAEKFL+IK PULP +PDAIVIVATLRALKMHGGV+K DL+ EN EAV+ GF NL 
Sbjct: 301 FGADIXSAEKFUStlKVPNLPKAPDAIVIVATLRALKMHGGVAKSDLAAENCEAVRLGFANL 360 

Query: 361 ERHVlM^RQYGVPVWAINQFTADTESEIATLKTLCSNID'TAVELASVWEDGADGGLEIiA 420 

+RHV ISIMRQ+ VPVWAIN+F ADTE+EIATLK LC I .V VELASVW +GA+GGL LA 
Sbjct: 361 KRHVEIOTIQFKVPVVVAINEFVADTEAEIATLKALCEEIKVPVEI^WANGA^^ 420 

Query: 421 QWAOTIETQSSOTKRLYJSTOEOTIEEKIKKIVTKIYGGNKVHFGPKAQIQLKEFSDNGWD 480 

+TV VI+ ++++YKRLY+DEDT+EEK+ IVT+IYGG V FGPKA+ QLK+F++ GWD 
Sbjct: 421 KTVTOVIDQEAaDYKELYSDEDTLEEKVINIVTQIYGGKAVQFGPkAKTQLKQFAEFGWD 480 

Query: 481 KMPICMAKTQYSFSDNPNLLGAPTDFDITVREFVPKTGAGFIVALTGDVLTMPGLPKKPA 540 

K+P+CMAKTQYSFSDNP+LLG&PTDFDIT+REFVPKTGAGFIV LTGDV+TMPGLPK PA 
Sbjct: 481 KLPVCMaKIQYSFSDNPSLLGAPTDFDITIREFVPKTGAGFIVGLTGD\mrMPGLPKVPA 540 

Query: 541 ALNMDVLEDGTAIGLF 556 

A+ MDV E+GTA+GLF 
Sbjct: 541 AMAMDVAENGTALGLF 556 

A related DNA sequence was identified in S.pyogenes <SEQ ID 9057> which encodes amino acid sequence 
<SEQ ID 9058>. Analysis of this protein sequence reveals the following: 

Possible site: 13 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.49 Transmembrane 516 - 532 ( 516 - 533) 



Final Results 



bacterial membrane 
bacterial outside 



- Certainty=0. 1595 (Affirmative) < suco 

- Certainty^O. 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS sequences follows: 

Score = 604 bits (1540) , Expect = e-174 

Identities = 304/555 (54%), Positives = 389/555 (69%), Gaps = 2/555 (0%) 





4 




0 J 






+DIEIA SV ++PI+++ +Q+GI + + LYGKyKAK+ ++ A+K++ GKLILVTAI 




Sbjct: 


3 


TDIEIAQSVALKPIAEIVEQVGIGFDDIELYGKyKAKLSFDKIEAVKSQKVGKLILVTAI 


62 




fid 




xzo 






+PTPAGEGK+T S+GL DaL+ IGKK +IALREPSL PMEDI 




Sbjct: 


63 


NPTPAGKGKSTMSIGLMl&imiGKKTMIALREPSLGPVMGIKGGAAGGGYAQV^ 


122 


Query: 


124 


iMunr X \3iJr n±\x\si vjtv>ii>ixjiM^i\LixLJL^n.xririK^ i wisjx. v v iJi''iLNjJi\.^iJi\jrix vuoij^ 








NLHFTGD HAI ANN L+AL+DNHIH GN L ID RR+ WKRWD+NDR LR ++ GL 




Sb j ct : 


123 


NIJIFTGDMHAITTANNaLSALLDtraiHQGNELDIDQRRVIWKRVVDLN^ 


182 


Query: 












VNGIPREDG+DITVASEIMAILCI1+ ++SDLK RIi I++ Y+ +P+ 




Sbj ct : 


183 


SPVNGIPREDGFDITVASEIMAILCLATDLSDLKKRLSNIVVAYSRNRKPIYVKDLKIEG 


242 


Query: 


OA A 










I PNLVQT+ TPAL+HGGPFANiaHGCNSVLAT AL+ DY VTEAGFG 




Sbjct: 


243 


ALTLILKDTIKPNLVQTIYGTPALVHGGPFANIAHGCNSVLATSTALRLADYWTEftGFG 


302 


Query: 




ATlT jr!AT?VT?TT^TIfPDMGPT»DDB7iTn7T.T7B'PTT37iT,.TnuiIira/^'^rDTfBT^T,2i'TT?WT7n3iTn7TY^ 
iUJXJUnaiVr XIJX tS.V^JXlTlOoiJKr'jnnVVXlVnl 1 km 1 >MYii-H'«TUt-'KMi ii im 1 h:im\/ijm w wi k-.i it-'ix;! it m 








ADLGftEKF+DIK P A+V+VAT+RALKMHGGV K DL+ ENV+AV G NL++ 




Sbjct: 


303 


ADLGaEKFLDIKTP^^^PTSPnAIVIVATLRALK^fflGGVSKEDLSQENVEAVKRGFTNLER 


362 


Query: 


"3 CA 


VtT 7\t<JT rWWTVm DinnJTi T7vTI^T?DT TlTTlTi ITT (^TVTrVT^B/^r^WTS/^TFlTnZT OriTTTATTiMr'i^anr'DTrT 7\T? 
ilijArJ ± V X t^J-iir V V Viil JN i\r fXiJJ I UAJlijyA v X U/iCUxVJXtj VJJ V V X bU v WrtJ>JljoAot3Kii.urtJi 








riT iNTT" XVjxjr V V vjHiXlNTr 1/XttJIt t V V TT vinl tV3 Kjvt CiUnrr 




Sbj ct: 


363 


HVNlilMRQ-YGVPVVVAINQFTADTESEIATLKTLCSNIDVAVELASVWED^^ 


421 


Query: 


424 


KVVTLAE-QDNQFRBVYEEDDSIETKLTKIOTKVYGGKGIOTiSSATUOlELADLERLGFG^ 


482 






V + E Q + ++ +Y ++D+IE K+ KIVTK+YGG ++ A+ +L + G+ 




Sbjct: 


422 


TVaOTIETQSS]SmCRLYm)EDTIEEKIKKIWKIYGGNKVHEGPKAQIQLKEFSDNGWt)K 


481 


Query: 


483 


YPICMAKTQYSFSDDAKKLGAPTDFTVTISNLKVSAGAGFIVALTGAIMTMPGLPKVPAS 


542 






PICMAKTQYSFSD+ LGAPTDF +T+ GAGFIVALTG ++TMPGLPK PA+ 




Sbjct: 


482 


MPICI^KTQYSFSDNPNLLGAPTDFDITVREFVPKTGAGFIVALTGDVLTMPGLPKKPAA 


541 


Query: 


543 


ETIDIDEEGNITGLF 557 








+D+ E+G GLF 




Sbjct: 


542 


USlMDVLEDGTAIGIiF 556 





SEQ ID 6198 (GBS131) was expressed in E.coli as a His-fiision product. SDS-PAGE analysis of total cell 
extract is shown in Figure 29 Qane 6; MW 64.8kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 35 (lane 4; MW 90kDa). 

GBS131-GST was purified as shown in Figure 201, lane 5. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2005 

A DNA sequence (GBSx2115) was identified in S.agalactiae <SEQ ID 620 1> which encodes the amino 
acid sequence <SEQ ID 6202>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.03 Transmembrane 34 - 50 ( 29 - 56) 
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INTEGRAL Likelihood = -7.70 Transmembrane 90 - 106 { 84 - 110)' 

INTEGRAL Likelihood = -1.97 Transmembrane 62 - 78 { 62 - 78) 

INTEGRAL Likelihood = -0.69 Transmembrane 275 - 291 ( 275 - 291) 



5 Final Results 

bacterial membrane Certainty=0. 5012 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA88609 GB:M37842 xmknown protein [Streptococcus mutans] 
Identities = 243/373 (65%), Positives = 302/373 (80%), Gaps = 1/373 (0%) 

Query: 71 IGAVLYLVNSEMDALSRVTWLILVMIAPLLGAMFLMYTKFDWGYRGLKQRLETLIDESQI 130 
15 IG+VLYLVNS+MD LS +TWL++++ P+LG +FL+YTK DWGYR LK ++ + 

Sbjct: 2 IGSVLYLVNSQMDTLSIITWLLVILPFPILGTLFLIYTKQDWGYRELKSLIJCKSTQAIKP 61 

Query: 131 YLEDDPETLNQLKSSTSTTYHLVQYFEKaHGNFPVYRNTDVTFLPTGEAFFEKMKEELLK 190 

Y + D L +LK S + TY+L QY ++ G FPVY+NT VT+ P G++ FE+MK++LLK 

20 Sbjct: 62 YFQYDQRILYJCLKESHARTYNLAQYIjEIRS-GGFPVYKNTKVTYFPNGQSKFEEMK^ 120 

Query: 191 AKKYIFLEFFIIDEGIMWGEILSILEQKVEEGVEVRILYDGMIEITKLSFDYTKRLEKIG 250 

A+K+IFLE+FII EG+MWGE1LSILEQKV+EGVEVR++YDGM+E++ LSFDY KRLEKIG 
Sbjct: 121 AEKFIFLEYFIIAEGLMWGEILSILEQK7QEGVEVRVMYDGMLELSTLSFDYAKRLEKIG 180 

25 

Query: 251 IKAKAFSPISPFISTYYNYRDHRKIV\n:DGVVGMTGGVNLaDEYINHIELFGHWKDSGIM 310 

IKAK FSPI+PF+STYYNYRDHRKI+VID V GG+NLADEYIN IE FG+WKD+ +M 
Sbjct: 181 IKAKVFSPITPFVSTYYNYRDHRKILVIDNKVAENGGINLADEYINQIERFGYWKDTAVM 240 

30 Query: 311 LKGKAVDSFLLLFLQMWSITEEKMLVAPYLGVHDDLVENEGYVIPYGDSPLDTDKVGENV 370 

L+G+ V SF L+FLQMWS T + APYL + + GYVIPY DSPLD +KVGENV 
Sbjct: 241 LEGEGVASFTLMFICMWSTTNKDYEFAPYLTQNFHEXVANGYVIFYSDSPLDHEKVGENV 300 

Query: 371 YIDILNHaREYVYIMTPYLILDSELEHAIQFAAERGVDVRIIMPGIPDKPIPYALAKTYY 430 
35 YIDILN AR+YVYIMTPYLILDSE+EHA+QFAAERGVDV+IIMPGIPDK +P+AI1AK Y+ 

Sbjct: 301 YIDIIJlJQaRDYVYIMTPYLIIDSEMEHaLQFAAERGVDVTCIIMPGIPDKKVPFAIiAKRYF 360 

Query: 431 QALTKSGVKIYEY 443 
AL +GVKIYE+ 
40 Sbjct: 361 PALLDAGVKIYEF 373 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6203> which encodes the amino acid 
sequence <SEQ ID 6204>. Analysis of this protein sequence reveals the following: 

Possible site: 47 
45 >» Seems to have no N-terminal signal , sequence 

INTEGRAL Likelihood = -8.86 Transmembrane 84 - 100 ( 81 - 104) 
INTEGRAL Likelihood = -8.33 Transmembrane 28 - 44 ( 23 - 49) 
INTEGRAL Likelihood = -6.74 Transmembrane 56 - 72 ( 53 - 74) 

50 Final Results 

bacterial membrane Certainty=0 .4545 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty^O . 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the databases: 

>GP:AAA23240 GB:J02911 f ormyltetrahydrof olate synthetase (FTHFS) 

(ttg start codon) (EC 6.3.4.3) [Moorella thermoacetica] 
Identities = 350/557 (62%) , Positives = 438/557 (77%) , Gaps = 2/557 (0%) 

60 Query: 2 VLSDIEIANSVT^ffiPISKVADQL6IDKEALCLYGKYKAKIDARQLVALKNKPDGKLILVT 61 

V SDIEIA + M+P+ ++A LGI ++ + LYGKYKAKI LK+KPDGKLILVT 
Sbjct: 4 VPSDIEIAQAAKMKPVMELARGLGIQEDEVELYGKYKAKISLDVYRRLKDKPDGKLILVT 63 



Query: 62 AISPTPAGEGKTTTSVGLVDALSAIGKKAVIALREPSLGPVFGVKGGAAGGGHAQWPME 121 
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AI+PTPAGEGKTTTSVGL DAL+ +GK+ ++ LREPSLGP FG+KGGAAGGG+AQWPME 
Sbjct:.64 AITPTPAGEGKTTTSVGLTDALARLGKRVMVCIiREPSLGPSFGIKGGAAGGGYAQVVPME 123 

Query: 122 DINLHFTGDFHAIGVAmLIJ^IDNHIHHGNSLGIDSRRITWKRVVDiyiNDRQr.RHIVDG 181 
5 DINLHFTGD HA+ A+NLLftA++DNH+ GN L ID R ITW+RV+D+NDR IiR+IV G 

Sbjct: 124 DiraaPTGDIHAVTYAHNLIMMVDNHLQQGNVLNIDPRTIT^ 183 

Query: 182 LQGKVNGIPREDGYDITVASEIMAILCLSENISDLKARLEKIIIGYNYQGEPVTAKDLKA 241 
L GK NG+PRE G+DI+VASE+MA LCL+ ++ DLK R +I++GY Y G+PVTA DL+A 
10 Sbjct: 184 LGGKflNGVPRETGFDISVASEVMACLCLASDLMDLKERFSRIWGYTYDGKPVTAGDLEA 243 

Query: 242 GGALAALLKDAIHPHtiVQTLEHTPALIHGGPFANIAHGOISVLATKLALKyGDyAVTEAG 301 

G++A L+KDAl ENLVQTLE+TPA IHGGPFANIAHGCNS++ATK ALK DY VTEAG 
Sbjct: 244 QGSMALLMKDAIKI>NLVQTLEOTPAFIHGGPFANIAHGC3SISIIATKTA^ 303 

15 

Query: 302 FGADLGAEKFIDIKCRMSGLRPAAWLVATIRALKMHGGVPKADLATENVQAWDGLPNL 361 

FGADLGAEKF D+KCR +G +P A V+VAT+RALKMHGGVPK+DLATEN++A+ +6 NL 
Sbjct: 304 FGADLGAEKFYDWCaiYAGFKPKiTVIVATVRALKMHGGVT'KSDIiATEKrLEALREGFJ^ 363 

20 Query: 362 DKHLRNIQDVYGLPVVVAINKFPLDTDAELQAVYnACEKRGVDWISDVWANGGft^ 421 

+KH+ NX +G+P WAIN FP DT+AEL +y+ C K G +V +S+VWA GG GG EL 
Sbjct: 364 EKHIENI-GKFGVPAVVAISIAFPTr)TEAEL^mI)YELCaKAGaffiVALSEVWAKGGEGGLEL 422 

Query: 422 AEKW-TLAEQDNQFRFVYEEDDSIETKLTKIVTKVYGGKGINLSSAAKRELRDLERLGF 480 

25 A k:v+ tl + + f +y d si+ k+ ki t++yg g+n ++ a + + e lg+ 

Sbjct: 423 ARKVLQTLESRPSNFHVLYNLDLSIKDKIAKIATEIYGADGVNYTAEADKAIQRYESLGY 482 

Query: 481 GNYPICMAKTQYSFSDDAKKLGAPTDFTVTISNLKVSAGAGFIVRLTGAimMPGLPK^ 540 
GN P+ MAKTQYSFSDD KLG P +FT+T+ +++SAG IV +TGAIMTMPGLPK P 
30 Sbjct: 483 GNLPVVMAKTQYSFSDDMTKLGRPRNFTITVREVRLSAGGRLIVPITGAIMTMPGLPKRP 542 

Query: 541 ASETIDIDEEGNITGLF 557 

A+ IDID +G ITGbF 
Sbjct: 543 AACNiblDADGVITGLF 559 
35 !GB:M37842 unknown protein [Streptococcus mutans] (v. . . 517 e-145 

>GP:AAA88609 GB:M37842 unknown protein [Streptococcus mutans] 
Identities = 246/370 (66%), Positives = 303/370 (81%), Gaps = 1/370 (0%) 

40 Query: 68 VLYLVNSDMDAISRMTWLILIMIAPLLGSLFLIYTKLDWGYRGLKQRINHLVDLSAPYLS 127 

VLYLWS MD +S +TWL++I+ P+LG+LFLIYTK DWGYR LK I PY 
Sbjct: 5, VLYLVNSQMDTLSIITWLLVILPFPILGTLFLIYTKQDWGYRELKSLIKKSTQAIKPYFQ 64 

Query: 128 DDiaiLEVLKDSTSTTYHLVQYLERSRGNFPIYlilOTRVTYFPTGETFFDSLKEQLFLAKK 187 
45 D IL LK+S + TY+L QYL RS G FP+Y NT+VTYFP G++ F+ +K+QL A+K 

Sbjct: 65 YDQRILYKLKESHARTYlSn^YLHRS-GGFPVYKNTKVTYFENGQSKPEE^ 123 

Query: 188 YIFLEFFIIAEGQMWGEILSILEKKVSEGVEVRVLFDGMNELSTLSSDYAKRLEQIGIKA 247 
+IFLE+FIIAEG MWGEILSILE+KV EGVEVRV++DGM ELSTLS DYAKRLE+IGIKA 
50 Sbjct: 124 FIFLEYFIIAEGLMWGEILSILEQKVQEGVEVRVMYDGMLELSTLSFDYAKRLEKIGIKA 183 

Query: 248 KSFLPISPFISTYYlSnniDHRKIWIDGEVSFTGGINLaDEyiNEVERPGHWKDAGI^ 307 

K F PI+PF+STYYNYRDHRKI+VID +V+F GGIliILADEYIN++ERFG+WKD +MLEG 
Sbjct: 184 KWSPITPFVSTYYiramHRKILVIDinTOaFNGGINLaDEYINQIERFGYWKDTA\MLEG 243 

55 

Query: 308 EATDSFLILFLQMWSITEKELIIDPYLSDHSLKLPSDGYVIPYGDSPLDTDKIGKNVYID 367 

E SF ++FLQMWS T K+ PYL+ + ++ ++GYVIPY DSPLD +K+G+NVYID 
Sbjct: 244 EGVASFTLMFLQtmSTOmDYEFAPYLTQNFHEIVANGYVIPYSDSPLDHEKVGENWID 303 

60 Query: 368 lENHAKEYVYim'PYLIIJJSEMEHRLRFASERGOTIRIIMPGVPDKiSVPYALftK^^ 427 

im A++YVYIMTPYLII1DSEMEHAL+FA+ERGVD++IIMPG+PDK VP+ALAK Y+ AL 
Sbjct: 304 IIJIQARDYVYIMTPYLILDSEKIEHALQFAAERG\7DVKIIMPGIPDKKVPFALAKRYFPAL 363 

Query: 428 MSSGVKIYEY 437 
65 + +GVKIYE+ 

Sbjct: 364 liDAGVKIYEF 373 



An alignment of the GAS and GBS proteins is shown below. 
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• Identities = 362/524 (69%), Positives = 437/524 (83%)- 

Query: 8 LISNKVKITOLimSKKSLLRGIFSRTWIAILLILQLLFLLASYSWLEQYRVWLATVEH 67 

+1 K K+ LL+K K LRGIPSRTT+I +L+ILQL+FL SY+W+EQYRVW+ +E 
Sbjct: 2 IIKKKAKVKYLLHKGKHGFLRGIFSRTTIIVLLIILQLVFLFQSYAWMEQYRVWITILES 61 

Query: 68 ILTICSAVLYLWSEMDALSRVTWLILVMIAPLLGfiMFLMYTKFDWGyRGLKQRLETLIDE 127 

+ I VLYLVNS+MDA+SR+TWLIL+MIAPLI1G++FL+YTK DWGYRGLKQR+ L+D 
Sbjct: 62 VFAITIVLYLWSDMDAISRMTWLILIMIAPLI/3SLFLIYTKIJ3WGYRGLKQRIimjVDL 121 

Query: 128 SQIYLEDDPETIiNQLKSSTSTTYHLVQYFEKaHGNFPVYRlirrDOTFLPTGEAFFEKMKEE 187 

S YL DD L LK STSTTYHLVQY E++ GNFP+Y NT VT+ PTGE FF+ +KE+ 
Sbjct: 122 SAPYLSDDDAILEVLKDSTSTTYHLVQYLERSRGNFPIYMNTRVTYPPTGETFFDSLKEQ 181 

Query: 188 LLKAKKYIFLEFFIIDEGIMWGEILSILEQKVEEGVEVRILYDGMIEITKLSFDYTKRLE 247 

L AKKYIFLEFFII EG MWGEILSILE+KV EGVEVR+L+DGM E++ LS DY KRLE 
Sbjct: 182 LF1AKKYIFLEFFIIAEGQ^OTGEILSILEKKVSEGVETOVLFDGMNELSTLSSD 241 

Query: 248 KIGIKAKAFSPISPFISTYYNYRDHRKIWIDGVVGMTGGVHLADEYINHIELFGHWKDS 307 

+IGIKAK+F PISPFISTYYNYRDHRKIWIDG V TGG+NLftDEYIN +E FGHWKD+ 
Sbjct: 242 QIGIKAKSFLPISPFISTYYiraiDHRKIWIDGEVSFTGGINLADEYIEIEVERPGHWKDA 301 

Query: 308 GIMLKGKAVDSFLLLFLQMWSITEEKMLVAPYLGVHDDLVENEGYVIPYGDSPLDTDKVG 367 

G+ML+G+A DSFL+LFLQMWSITE+++++ PYL H + ++GYVIPYGDSPI1DTDK+G 
Sbjct: 302 GLMLEGEATDSFLILFLQMWSITEKELIIDPYLSDHSLKLPSDGYVIPYGDSPLDTDKIG 361 

Query: 368 EJmiDILNHaREYVYIMTPYLILDSELEHAIQFAAERGVDVRIIMPGIPDKPIPYALAK 427 

+NVYIDm!IHA+EYVYIMTPYLILDSE+EHA++FA+ERGVD+RIIMPG+PDK +PYALAK 
Sbjct: 362 Kim-IDIUQHAKEYVYIMTPYLILDSEMEHALRFASERGVDIRIIMPGVPDKGVPYAL^ 421 

Query: 428 TYYQALTKSGVKIYEYTLGFVHSKIFLSDNTKAWGTINLDYRSLYHHFECAVYLYKVDA 487 

TYY+AL SGVKIYEY GFVHSK+F+SDNTKAWGTINLDYRSLYHHFECA YLY+V 
Sbjct: 422 TYYKAMSSGVKIYEYQPGFVHSKVFISDNTKAWGTINLDYRSLYHHFECATYLYRVSV 481 

Query: 488 IQDIYRDYMJTLNKSRLVSLKDIOTIPKFQKVIGIVTKTIAPLL 531 

I DI D+ + +S L++ + P +QK+IG++ + lAPLL 
Sbjct: 482 rWDIVNDENEAQKQSLLMTSDHLTQRPWYQKLIGLLVRIIAPLL 525 

A related GBS gene <SEQ E) 8953> and protein <SEQ ID 8954> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 6 
McG: Discrim Score: -8.80 
GvH: Signal Score (-7.5): -1.94 

Possible site: 53 
>>> Seems to have no N-terminal signal sequence 
ALOM program count: 4 value: -10.03 threshold: 0.0 

INTEGRAL Likelihood =-10.03 Transmembrane 34 - 50 { 29 - 56) 
INTEGRAL Likelihood = -7.70 Transmembrane 90 - 106 { 84 - 110) 
INTEGRAL Likelihood = -1.97 Transmembrane 62 - 78 ( 62 - 78) 
PERIPHERAL Likelihood = 1.22 199 
modified ALOM score: 2.51 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 5012 (Affirmative) < suco 

bacterial outside — Certainty=0.0000(Not Clear) < suco 

bacterial cytoplasm — Certaintyi=0. 0000 (Hot Clear) < suco 

The protein has homology with the following sequences in the databases: 

32.5/57.2% over 498aa 

Bacillus firmus 

SP|O66043| CARDIOLIPIN SYNTHETASE (EC 2.7.8.-) (CARDIOLIPIN SYNTHASE) (CL SYNTHASE). 

Insert characterized 

GP|2952028lgb|AAC05444.l| |U88888 cardiolipin synthase Insert characterized 
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ORF01572(409 - 1893 of 2193) 

SP|O66043|CLS_BACFI(5 - 503 of 503) CARDIOLIPIN SYNTHETASE (EC 2.7.8.-) (CARDIOLIPIN 
SYNTHASE) (CL SyWTHASE) . GP| 2952028 | gb|AAC05444 . 1 1 |U88888 cardiolipin synthase {Bacillus 
5 firnus} 

%Match =17.9 

%Identity =32.5 %Similarity = 57.1 

Matches = 162 Mismatches = 204 Conservative Sub.s = 123 



10 153 183 213 243 273 303 333 363 

OTjQLSIVroiF*KTVQPLDYFK**RGRACDASLFIjI,GIRF*LEII*NNRM]:iF^ 

393 423 447 477 507 528 558 588 

SKKSLLRGIFSRTTVIAII,LILQLLF--LLa.SYSWLEQYRVWIATVEHILT---IGAVLYLVNSEMDALSRVTWLILVMI 
15 : ,| , III I I , I ,| |, II :::: I :|||::: 

MKNRIjNVIAFFALLFAALYISRGFLQSWMVGTLSWFTLSVIFIGIIIFFEN--RHPTKTLTWLLVIJ^ 
10 20 30 40 50 60 



618 648 678 705 735 765 789 819 

20 APLLGftMFIjMYTKFDWGYRGLKQRLETLIDESQIYLE-DDPETLNQLKSSTSTTYHLVQyFEKAH- -GNFPVYRNTDVTF 

|::| |: I I =1 |: : |:: : : : : I): : | I jj ||: ::: 

FPVVG--FFFYIJ«FGQNHRKSKRFSKKAIEDERAFQKIEGQRQIjNE-EQLK™GGHQQLLFRIiRHKrC 

80 90 100 110 120 130 140 

25 ■ 849 879 909 939 969 999 1029 1059 

LPTGEAFFEKMKEELLKZOCKYIFLEFFIIDEGIlyWGEILSlIiEQKOTlEGVEVRILYDGMIEITKLSFDYTKRLEKIGIKA 

I |: : : : I |: :| : H II I =111 II llll = III I : I 1=: 

LTDGKETYAHILQALKMAEHHIHLEYYIVRHDDLGNQIKDILISKZiKEGVHVRFLYDG-VGSWKLSK^ 

160 170 180 190 200 210 220 

30 

1086 1116 1146 1176 1206 1236 1266 1293 

KAFSPIS-PFISTYY]ra^DHRKIWIDGWGMTGGVNLADEYIMHIELFGHWKDSGIMLKGKA.VDSFLLLFLQMWSI-TE 

= 111: II:: | | 1 : | || | : | | | | | | | ||:|: \\\: ||:|:|: : ::|:|| =: |:||l 1 I 

VSFSPVTCLPFLTHTINYRNERKIIVIDGWGFVGGLNIGDEYICKDAYFGYWRDTHLYTOGEATOTLQIiIF^^ 
35 240 250 260 270 280 290 300 

1323 1353 1383 1413 1443 1473 1503 1533 

EKMDmPYLGVHDDLVEIffiGYVIPYGDSPLDTDKTOENVYIDIIMIAREYVYIMrPYLILDSELEH^ 



40 ETIIJIQTYLSPSLSMTKGDGGVQMIASGPDTRliffiVNKKIiFFSMITSAKKSIWIASPYFIPDDDILSALKIAALSGIDVRI 

320 330 340 350 360 370 380 



1563 1593 1623 1653 1683 1713 1743 1773 

IMPGIPDKPIPYALAKTYYQALTKSGVKIYEYTLGFVHSKIFLSDNTKAWGTINLDYRSLYHHFECAWLYKVDAIQDI 

45 =:| nil: :::|: I ::|||:||l Ihllll : h I :|l 1 = 1 Ih: :|l == = 

LVPNRPDKRIVFHASRSYFPELLEAGVKVYEYireGFMHSKI I IVDHEIASIGTSNMDMRSFHLNFEVNAYLYRTSSVTKL 
400 410 420 430 440 450 460 



1803 1833 1863 1893 1923 1953 1983 2013 

50 YRDYMDTmKSRLVSLKDimjIPKFQKVIGIVTKTIAPLL*K*FIFNLILKra*RI*LYLKSKBCILTKLC*TTVm 
, II: I I ::: | | |:::| :: ::||| 
VSDYVYDLEHSNQINFSLFKNRPFFHRLIESTSRLLSPLL 
480 490 500 

SEQ ID 8954 (GBS277d) was expressed in E.coli as a His-fiision product. SDS-PAGE analysis of total cell 
55 extract is shown in Figure 150 (lane 18; MW 51kDa), in Figure 151 (lane 17 & 18; MW 51kDa) and in 
Figure 182 (lane 12; MW 51kDa). It was also expressed in E.coli as a GST-fiision product. SDS-PAGE 
analysis of total cell extract is shown in Figure 151 (lane 15 & 16; MW 76kDa) and in Figure 58 (lane 5; 
MW 87kDa). 

GBS277d-His was purified as shown in Figure 235, lane 8. 



60 



Based on this analysis, it was predicted that these proteins and theix epitopes could be useM antigens for 
vaccines or diagnostics. 
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Example 2006 

A DNA sequence (GBSx2116) was identified in S.agalactiae <SEQ ID 6205> which encodes the amino 
acid sequence <SEQ ID 6206>. This protein is predicted to be aspartate-semialdehyde dehydrogenase. 
Analysis of this protein sequence reveals the following: 
Possible site: 42 

»> Seems to have an imcleavable N-tenti signal seq 



A related GBS nucleic acid sequence <SEQ ID 983 1> which encodes amino acid sequence <SEQ ID 9832> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA26850 GB:J02667 aspartate beta-semialdehyde dehydrogenase (EC 
1.2.1.11) [Sti-eptococcus mutans ] 
Identities = 261/357 (73%) , Positives = 304/357 (85%) , Gaps = 1/357 (0%) 



Query: 


1 


MGYTTOIVGATGAVGTQMIRQLEQSNLPIEQVKLLSSSRSAGKIIiHFKDEAIRVEETTKE 


60 






MGYTVAIVGATGAVGT+MI+QLEQS LP+++V+LLSSSRSAGK+ri +KD+ + VE TTK+ 




Sbjct: 


1 


MGYT\aiVGATGaVGTRMIQQLEQSTLPVDKVRLLSSSRSAGKVLQVKDQDVTVELTTKD 


60 


Query: 


61 


SFYDVDIALFSAGGSISAKFAPYAVKSGAVWDNTSYFRQNPDVPLWPEVNAHAMIGHN 


120 






SP VDIALFSAGGS+SAKFAPYAVK+GAVWDNTS+FROaPDVPIiWPEVNA+AM HN 




Sb j ct : 


61 


SFEAVDIALFSAGGSVSAKFAPYAVKAGAVVVIm'SHFRGNPDVPLVVPEVMAYA^C>a^ 


120 


Query: 


121 


GIIACPNCSTIQMMIALEPIRQKWGIERVIVSTYQAVSGSGARAVEETKEQLRQVLNDNL 


180 






G1IACPNCSTIQMM+ALEPIRQKWG+ RVIVSTYQAVSG+G A+ ET ++++V+ND + 




Sbjct: 


121 


GIIACPNCSTIQ^MVALEPIRQKWGLSRVIVSTYQAVSGAGQSA1NETVREIKEVVNDGV 


180 


Qaexy: 


181 


SPDQLIATVLPCSSDQKHYPIAFNALPQIDIFTDNDYTYEEMKMTLETKKIMEDATIKVS 


240 






P + A + P D+KHYPIAFNAL QID+FTDNDYTYEEMKMT ETKKIME+ + VS 




Sb j ct : 


181 


DPKAVHADIFPSGGDKKHYPIAFNAIAQIDVFTDNDYTYEEMKMTNETKKIlyE^ 


240 


Query: 


241 


ATCTOIPVLSGHSESiyiETKELASISEIKKAIANFPGAVI<3DLPSQQIYPQAINAVGHR 


300 






A CVR+P+L HSE++YIETK++A I E+K AIA FPGAVL+D QIYPQA NAVG R 




Sb j ct : 


241 


AHCTOVPILFSHSEAWIETKDVAPIEEVKAAIAAFPGAVLEDDIKHQIYPQRANAVGSR 


300 


Query: 


301 


ETFVGRIRKDLDQENGVHMWWSDNLLKGAAWNSVQIAETLHKNGLVKPAKELKFEL 357 






TFVGRIRKDLD ENG+HMWWSDNLLKGAAWNS+ A LH+ GLV+ ELKFEL 




Sb j ct : 


301 


-TFVGRIRKDLDIENGIHMWWSDNLLKGAAWNSIITANRLHERGLVRSTSELKFEL 356 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2007 

A DNA sequence (GBSx2117) was identified in S.agalactiae <SEQ ID 6207> which encodes the amino 
acid sequence <SEQ ID 6208>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.66 Transmembrane 33 - 49 ( 33 - 49) 



Pinal Results 



bacterial membrane Certainty^O. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 



Final Results 



bacterial membrane 
bacterial outside 



- Certainty=0. 2062 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 
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bacterial cytoplasm Certaintys=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database, but there is 
homology to SEQ ID 500. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2008 

A DNA sequence (GBSx2119) was identified in S.agalactiae <SEQ ID 6209> which encodes the amino 
acid sequence <SEQ ID 621 0>. Analysis of this protein sequence reveals the following: 

10 Possible site: 24 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3853 (Affirmative) < suco 

15 bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

20 Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

Example 2009 

A DNA sequence (GBSx2120) was identified in S.agalactiae <SEQ ID 621 1> which encodes the amino 
acid sequence <SEQ ID 6212>. This protein is predicted to be unnamed protein product (clpP). Analysis of 
25 this protein sequence reveals the following: 

Possible site: 45 

>» Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm — Certainty=0. 3883 (Affirmative) < suco 

bacterial merabrane Certainty= 0 . 0 0 0 0 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10061> which encodes amino acid sequence <SEQ ID 
35 10062> was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6213> which encodes the amino acid 
sequence <SEQ ID 6214>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

»> Seems to have no N-terminal signal sequence 

40 

Final Results 

bacterial cytoplasm — Certainty=0. 2682 (Affirmative) < suco 
bacterial membrane — Certainty= 0.0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

45 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 175/196 (89%) , Positives = 187/196 (95%) 
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Query: 5 MIPWIEQTSRGERSYDIYSRLLKDRIIMLTGQVEDNMANSIIAQLLFLDAQDNTKDIYL 64 

MIPWIEQTSRGERSYDIYSRLLKDRIIMLTG VEDNMANS+IAQLLFLDAQDNTKDIYL 
Sbjct: 1 MIPWIEQTSRGERSYDIYSRLLKDRIIMLTGPVEDNMftNSVIAQLLFLDAQDNTKDIYL 60 

5 Query: 65 YVOTPGGSVSAGLAIVDTMNPIKSDVQTIVMGMAASMGTIIASSGaKGKEPMLPISM 124 

YVNTPGGSVSAGLAIVDTMNFIK+DVQTIVMGMftASMGT+IASSG RGKRFMLPNAEYMI 
Sbjct: 61 YVOTPGGSVSftGLAIVDTMNFIKADVCJTIVMGMRASMGWIASSGTKGKRFMLPN^ 120 

Query: 125 HQPMGGTGGGTQQSDMAIAAEHLLKTRHTLEKILADNSGQSIEKVHDDAERDRWMSAQET 184 
10 HQPMGGTGGGTQQ+DMAIAAEHLLKTRH LEKIIiA N+G++I+++H DAERD WMSA+ET 

Sbjct: 121 HQPMGGTGGGTQQTDrmiAAEHLLKTRHRLEKILAQNAGKTIKQIHKiaAERDVWMSaEET 180 

Query: 185 LDYGFIDAIMENNNLQ 200 
L YGFID IMENN L+ 
15 Sbjct: 181 LAYGFIDEIMENNELK 196 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2010 

20 A DNA sequence (GBSx2121) was identified in S.agalactiae <SEQ ID 6215> which encodes the amino 
acid sequence <SEQ ID 6216>. This protein is predicted to be uracil phosphoribosyltransferase (upp). 
Analysis of this protein sequence reveals the following: 

Possible site: 26 

>» Seems to have no N-terminal signal sequence 
25 INTEGRAL Likelihood = -0.43 Transmembrane 127 - 143 ( 127 - 144) 

INTEGRAL Likelihood = -0.06 Transmembrane 72 - 88 ( 72 - 89) 
INTEGRAL Likelihood = -0.06 Transmembrane 154 - 170 ( 154 - 170) 

Final Results 

30 bacterial membrane Certainty=0. 1171 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10063> which encodes amino acid sequence <SEQ ID 
35 10064> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA26890 GB:L07793 uracil phosphoribosyltransferase 

[Streptococcus salivarius] 
Identities = 192/209 (91%) , Positives = 202/209 (95%) 

40 

Query: 1 MGKFQVISHPLIQHKLSILRRTTTSTKDFRELVDEIAMLMGYEVSRDLPLEDVEIQTPVA 60 

MGKFQVISHPLIQHKLSILRR TSTKDFRELV+EIAMI1MGYEVSRDLPLE+VEIQTP+ 
Sbjct: 1 MGKFQVISHPLIQHKLSILRREDTSTKDFRELVNEIAMLMGYEVSRDLPLEEVEIQTPIT 60 

45 Query: 61 TTVQKQLAGKKLAIVPILRAGIGMVDGFLSLVPAAKVGHIGMYRDEETFQPVEYLVKLPE 120 

TVQKQL+GKKLAIVPILRAGIGMVDGFLSLVPAAKVGHIGMYRDEET +PVEYLVKLPE 
Sbjct: 61 KTVQKQLSGKKLAIVPILRAGIGMVDGPLSLVPAAKVGHIGMYRDEETLEPVEYLVKLPE 120 

Query: 121 DIDQRQIFWDPMLATGGSAILAVDSLKKRGAASIKFVCLVAAPEGVAALQEAHPDVDIY 180 
50 DIDQRQIFWDPMLATGGSAILAVDSLKKRGAA+IKFVCLVAAPEGV LQ+AHPD+DIY 

Sbjct: 121 DIDQRQIFVVDPMLATGGSAILAVDSLKKRGAANIKFVCLVAAPEGVKKLQDAHPDIDIY 180 

Query: 181 TAALDEKUSEHGYIVPGLGDAGDRLFGTK 209 
TA+LDEKLNE+GYIVPGLGDAGDRLFGTK 
55 Sbjct: 181 TASLDEKIiNENGYXVPGLGDAGDRLFGTK 209 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6217> which encodes the amino acid 
sequence <SEQ ED 6218>. Analysis of this protein sequence reveals the following: 
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Possible site: 26 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.59 Transmembrane 72 - 88 ( 72 - 89) 
INTEGRAL Likelihood = -0.22 Transmembrane 127 - 143 ( 127 - 144) 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sb j ct : 


181 



Final Results 

bacterial membrane Certainty=0 . 1235 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein is similar to uracil phosphoribosyltransferase from S.salivarius: 

>GP:AaA26890 GB:L07793 uracil phosphoribosyltransferase [Streptococcus salivarius] 
Identities = 191/209 (91%) , Positives = 205/209 (97%) 



MGK QVISHPLIQHKLSILRR+ TSTKDFRELVNEIAMLMGYEVSRDLPLE+V+IQTP++ 



KTVQKQL+GKKLAIVPILRAGIGMVDG LSLVPAAKVGHIGMYR+EETLEPVEYLVKLPE 



DI+QRQIF+VDPMLATGGSAIIiAVDSLKKRGAftNIKFVCLVaAPEGVKKLQ+AHPDIDI+ 



TA+LD+ LNE+GYIVPGLGDAGDRLFGTK 
Sbjct: 181 TASLDEKLNENGYIVPGLGDAGDRLFGTK 209 

/ 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 190/209 (90%), Positives = 201/209 (95%) 

Query: 1 MGKFQVISHPLIQHKLSILRRTTTSTKDFRELVDEIAMLMGYEVSRDLPLEDVEIQTPVA 60 

MGK QVISHPLIQHKLSILRR TTSTKDFRELV+EIAMLMGYEVSRDLPLEDV+IQTPV+ 
Sbjct: 1 MGKCQVISHPLIQHKLSILRRQTTSTKDFRELVNEIAMLMGYEVSRDLPIiEDVDIQTPVS 60 

Query: 61 TTVQKQLAGKKLAIVPILRAGIGMVDGFLSLVPAAKVGHIGMYRDEETFQPVEYLVKLPE 120 

TVQKQLAGKKLAIVPILRAGIGMVDG LSLVPAAKVGHIGMYR+EET +PVEYLVKLPE 
Sbjct: 61 KTVQKQLAGKKLAIVPILRAGIQIVDGLLSLVPAAKVGHIGMYRNEETLEPVEYLVKLPE 120 

Query: 121 DIDQRQIFWDPMLATGGSAILAVDSLKKRGAASIKFVCLVAAPEGVAALQEAHPDVDIY 180 

DI+QRQIF+VDPMLATGGSAILAVDSLKKRGAA+IKFVCLVAAPEGV LQEAHPD+DI+ 
Sbjct: 121 DINQRQIFLVDPMLATGGSAIIiAVDSLKia«3AANIKFVCLVaAPEGVKKLQEAHPDIDIF 180 

Query: 181 TAALDEKLNEHGYIVPGLGDAGDRLFGTK 209 

TAALD+ LNEHGYIVPGL6DAGDRLFGTK 
Sbjct: 181 TAALDDHLNEHGYIVPGLGDAGDRLFGTK 209 

Based on this analysis, it was predicted that these proteins and their epitopes could be useftil antigens for 
vaccines or diagnostics. 

Example 2011 

A DNA sequence (GBSx2122) was identified in S.agalactiae <SEQ ID 6219> which encodes the amino 
acid sequence <SEQ ID 6220>. This protein is predicted to be hemolysin (patB). Analysis of this protein 
sequence reveals the following: 

Possible site: 48 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.29 Transmembrane 88 - 104 ( 86 - 106) 

Final Results 
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bacterial membrane Certainty=0. 2 3 17 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15133 GB:Z99120 aminotransferase [Bacillus subtilis] 
Identities = 130/381 (34%) , Positives = 221/381 (57%) , Gaps = 4/381 (1%) 

Query: 5 DFTSLPERFSSNTIKWKAVQK---DQEILPLWIADMDFPIFPEMSEAIEDFSHQMVFG'HD 61 
10 +F ER + ++KW + + LP+W+ADMDF ++EA+++ +FGY 

Sbjct: 2 NFDKREERLGTQSVKWDKTGELPGVTnaLPMfmD^roFRRPEAITEaLKEIy^ 61 

Query: 62 SPKDSLYQAISNWEVQEHGYQFDKKSLLLIDGWPAISVAIQAFTKEGDAVLINTPVYPP 121 
+P A+ W HG++ + +S+ GW A+S+A+QAFT+ GD V++ PVY P 

15 Sbjct: 62 TPDQKIKnAVCGWMQI!M^GWKMWESITFSPGVVTALS^aVQAFTEPGDQVWQPPVY^ 121 

, Query: 122 FARTIKXNNRHLVSNSLtNraQYFEIDFKQLEKDIIENl^^ 181 
F ++ N RH++ N LL + + IDF+ LE + + +V L+I C+PHNP 6R W++ + 
Sbjct: 122 PYHMVEKNGRHILHNPI^EKDGaYAIDFEDLETKLSDPSVTLFirjCNPHNPSGRSWSRED 181 

20 

Query: 182 IQKIGDICKRYNVILVSDEIHQDLVLFDNVHHSFNTVDSSFKELSVILSSATKTFNIAGT 241 

+ K+G++C + V +VSDEIH DL+L+ + H F ++ F ++SV ++ +KTENIAG 
Sbjct: 182 LLKLGELCLEHGVTVVSDEIHSDLMLYGHKHTPFASLSDDFADISVTCAAPSKTENIAGL 241 

25 Query: 242 KNSPAIIEtffiKLRSDFKKRQIAMNQQEISSIfiLIATEVAFTKEKQraiKRLKMELBGSIE^ 301 

+ S II + R+ F N +++ + A E A++K WL L +E ++ 

Sbjct: 242 QASAlilPDRLKRAKFSASLQRNGIXSGimFAVTAIEAAYSKGGPWLDEXiITYIEKNMNE 301 

Query: 302 LYEQL-TQKTHIKVMKPEGTYLWLDFSAYWLTHLEIQEKLRYDAKLIIjroGLTFGKEGK 360 
30 L T+ +K+MKP+ +YL+WLDFSAY L+ E+Q+++ K+IL G +G G+ 

Sbjct: 302 AEAFLSTELPiancmKPDASYLIWLDFSAYGLSDAELQQRMDKKGKVILETO^ 361 

Query: 361 KHARINVAAPRSVIEEAVLRL 381 
R+N + +++ + R+ 

35 Sbjct: 362 GFMRUSIAGCSLATLQDGLRRI 382 

There is also homology to SEQ ID 1006. 

SEQ ID 6220 (GBS392) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 75 (}me 2; MW 46.4kDa). It was also expressed in E.coli as a GST-fusion 
40 product. SDS-PAGE analysis of total cell extract is shown in Figure 83 (lane 5; MW 71kDa). 

GBS392-GST was purified as shown in Figure 217, lane 4. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2012 

45 A DNA sequence (GBSx2123) was identified in S.agalactiae <SEQ ID 6221> which encodes the amino 
acid sequence <SEQ ID 6222>. This protein is predicted to be rRNA methylase, SpoU family (cspR). 
Analysis of this protein sequence reveals the following: 



50 



55 



Possible site: 39 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1436 (Affirmative) < suco 

bacterial membrane Certainty^O. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:AAB02738 GB:U58864 CspR [Bacillus subtilis] 
Identities = 84/155 (54%) , Positives = 120/155 (77%) , Gaps = 3/155 (1%) 

Query: 19 HIVLFEPQIPANTGNIARTCfiATNAPLHIIRPMGPPIDDKraKKAGLDYWDKLDVSFYDG 78 

H+VL++P+IPANTGNIARTCAATN LH+IRP+GF DDK +KRAGLDYW+ ++V ++D 
Sbjct: 4 HVTOiYQPEIPAimSNIARTCAATNTTUmiRPLGFSTDDKMLKRAGLDXWEFVNV^ 63 

Query: 79 LEE-FMLSCRGKOTIiISKFADKVYSDENYND-DQDHyFMFGREDKGLPETFMREHAEKAL 136 

LEE F +GK I+KF + ++ +Y D D+D++F+FGRE Gi:iP+ ++ + ++ L 
Sbjct: 64 LEELFEAYKKGKFFFITKFGQQPHTSFDYTDLDEDYFFVFGRETSGLPKDLIQNNMDRCL 123 

Query: 137 RIPMNDEHVRSUWSNTVCMIVYEALRQQSFPNLE 171 

R+PM EHVRSIiN+SNT ++VYEALRQQ++ +L+ 
Sbjct: 124 RLPMT-EHVRSUSLSNTAAILVyEaLRQQNYRDIiK 157 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6223> which encodes the amino acid 
sequence <SEQ ID 6224>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2236 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 135/182 (74%) , Positives = 150/182 (82%) 

Query: 1' MNIETLTQKNHRSDSGRNHIVLFEPQIPANTGNIARTCAATNAPLHIIRPMGFPIDDKKM 60 

M + L KN + RNHIVLF+PQIP NTGNIARTCAATHAPLHII+PMGFPIDD+KM 
Sbjct: 13 ^lTTKELIlmlDKVKKaRlraIVLFQPQIPQNTGNIARTCAATSIAPLHIIK^ 72 

Query: 61 KRAGLDYWDKLDVSFYDGLEEFMLSCRGKVHLISKFADKVYSDENYNDDQDHYFMFGRED 120 

KRAGLDYWDKL++ FYD LE+F+ C G++HLISKFA YS YD HYF+FGRED 
Sbjct: 73 KRAGLDYWDKLELHFYDHLEQFINQCHGQLHLISKFAVNNYSQATYADGDSHYFLFGRED 132 

. Query: 121 KGLPETFMREHAEKALRIPMITOEHWSLIWSNTVCMIVYEALRQQSFPNLELSHTYENDK 180 
GLPE FMREHAEKALRIPMNDEHWSLNVSNTVCM++YEALRQQ F LEL HTYE+DK 
Sbjct: 133 TGIiPEDFMREHAEKALRIPMNDEHVRSIJOTSimrCM\n;YEMJlQQGFQGLELKHTYEH^ 192 

Query: 181 LK 182 

LK, 

Sbjct: 193 LK 194 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2013 

A DNA sequence (GBSx2124) was identified in S.agalactiae <SEQ ID 6225> which encodes the amino 
acid sequence <SEQ ID 6226>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

»> Seems to have no N-terrainal signal sequence 

INTEGRAL Likelihood = -6.79 Transmembrane 82 - 98 ( 69 - 100) 

INTEGRAL Likelihood = -6.48 Transmembrane 27 - 43 ( 24 - 47) 

INTEGRAL Likelihood = -5.52 Transmembrane 132 - 148 ( 126 - 151) 

INTEGRAL Likelihood = -5.10 Transmembrane 162 - 178 ( 161 - 185) 

Final Results 

bacterial membrane Certainty=0. 3718 (Affirmative) < suco 

bacterial outside Cea:tainty=0.0000(Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ED 941 1> wliich encodes amino acid sequence <SEQ ID 9412> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

5 >GP:CAB13143 GB:Z99110 similar to amino acid permease [Bacillus subtilis] 

Identities = 46/143 (32%) , Positives = 81/143 (56%) , Gaps = 1/143 (0%) 

Query: 3 FAYDGVWIEWIAPEVKNPKKmPLAFVIGPALILLSYLAFFYGLTQIIXSASFIMTTCS^ 62 
, FAYD6W + + E+KNP+K LP A G ++ y+ + L IL A+ I+T G + 
10 Sbjct: 203 FAYDGWILIiAALGGEMKNPEKLLPRAMTGGLLIVTAIYIFINFALI^ILSANEIVTLGEN 262 

Query: 63 AINYAANIIFGPSVGRLLSFIVILSVLGVANGLLLGTMRLPQAFAERGWIK-SERMANIN 121 

A + AA ++FG G+L+S +I+S+ G NG +L R+ A AER + +E++++++ 
Sbjct: 263 ATSTAATMLFGSIGGKLISVGIIVSIFGCLNGKVLSFPRVSFAMAERKQLPFAEKLSHVH 322 

15 

Query: 122 LKYQMSLPASLTVTAVAIFWLFV 144 

++ A A+A+ + + 
Sbjct: 323 PSFRTPWIAISFQIALALIMMLI 345 

20 There is also homology to SEQ ID 3 1 14. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2014 

A DNA sequence (GBSx2125) was identified in S.agalactiae <SEQ ID 6227> which encodes the amino 
25 acid sequence <SEQ ID 6228>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

»> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0 . 1849 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9439> which encodes amino acid sequence <SEQ ID 9440> 
35 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD23454 GB:AF117741 cochaperonin GroES [Streptococcus pneumoniae] 

Identities = 31/52 (59%) , Positives = 42/52 (80%) 

40 Query: 2 GDGIRTLTGELVAPSVAEGDTVLVENGAGLEVKDGNEKVTWRESDIVAWK 53 

G G+RTL G+LVAPSV GD VLVE AGL+VKDG+EK +V E++I+A+++ 
Sbjct: 42 GQGVRTLNGDLVAPSVKTGDRVLVEaHAGIiDVKDGDEKyilVGERlIILAIIE 93 

, A related DNA sequence was identified in S.pyogenes <SEQ ID 6229> which encodes flie amino acid 
45 sequence <SEQ ID 6230>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

>>> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 3290 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 29/49 (59%) , Positives = 39/49 (79%) 

Query: 4 GIRTLTGELVAPSVAEGDTVLVENGRGLEVKDGNEKVTVVRESDIVAW 52 
5 G+RT+TG+ V PSV+ G VLVENG LEV +EKV+++RESDI+A+V 

Sbjct: 60 GVRTITGDSVLPSVSVGQEVLVENGHDLEVTVDDEKVSIIRESDIIAIV 108 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

10 Example 2015 

A DNA sequence (GBSx2126) was identified in S.agalactiae <SEQ ID 623 1> which encodes the amino 
acid sequence <SEQ ID 6232>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no N-terminal signal sequence 

15 

Final Results 

bacterial cytoplasm Certaanty=0 . 1272 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

20 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD23455 GB:AF117741 chaperonin GroEL [Streptococcus pneumoniae] 
Identities = 472/539 (87%) , Positives = 513/539 (94%) , Gaps = 1/539 (0%) 



25 


Query: 


1 


MAKDIKFSADARSAMTOGVDILADTVKVTLGPKGRNVVLEKAFGSPLITNDGVTIAKEIE 


60 








M+K+IKFS+DARSAMVRGVDILADTVKVTLGPK RNWLEK+FGSPLITNDGVTIAKEIE 






Sbjct: 


1 


^©KEIKFSSDftRSAMVRGVDIIADTVKVTLGPKDRNVVLEKSFGSPLITNDGVTIAKEIE 


60 




Query: 


61 


LEDHFENMGAKLVSEVASKTiroiAGDGTTTATVLTQAIVREGLKNVTAGKNPIGIRRGIE 


120 


30 






LEDHFENMGAKLVSE+ASKTNDIAGDGTTTATVLTQAIVREG+KNVTAGANPIGIRRGIE 






Sbjct: 


61 


LEDHFENMGAKLVSEIASKITOIAGDGTTTATVLTQAIVREGIKNVTAGANPIGIRRGIE 


120 




Query: 


121 


TAVSAAVEEIiKEIAQPVSGKEAIAQVAAVSSRSEKVGEYISEAMERVGNDGVITIEESRG 


180 








TAV+AAVE LK A PV+ KEAI+QVAAVSSRSEKVGEYISEAME+VG DGVITIEESRG 




35 


Sbjct: 


121 


TAVAAAVEfiLKNNAIPVaNKEAISQVAAVSSRSEKVGEYISEAMEKVGKDGVITIEESRG 


180 




Query: 


181 


METELEWEGMQFDRGYLSQVMVTDNEKMVSELENPYILITDKKISNIQEILPLLEEVLK 


240 








METELEVVEGMQFDRGYLSQYMVTD+EKMV++LENPYILXTDKKISNIQEILPLLE +L+ 




40 


Sb j ct : 


181 


METELEWEGMQFDRGYLSQYMVTDSEKMVADLENPYILITDKKISNIQEILPLLESILQ 


240 




Query: 


241 


TNRPLLIIADDVDGEALPTLVLNKIRGTFNWAVKAPGFGDRRKAMLEDIAILTGGTWT 


300 








+NRPIiLIIRDDVDGERIiPTLVimiRGTFNVVAVKAPGFGDRRKRMLEDIAILTGGTV+T 






Sb j ct : 


241 


SNRPLLIIADDVDGEALPTLVEJIKIRGTFNVVRVKAPGFGDRRKAMLEDIAILTGGTVIT 


300 


45 


Query: 


301 


EDLGLDLKDATMQVLGQSAKVTVDKDSTVIVEGAGDSSAIANRVAIIKSQMEATTSDFDR 


360 








EDLGL+LKDAT++ LGQ+A+VTVDKDSTVIVEGAG+ AI++RVA+IKSQ+E TTS+FDR 






Sb j ct : 


301 


EDLGLELKDATIEALGQAARVTVDKDSTVIVEGAGNPEAISHRVAVIKSQIETTTSEFDR 


360 




Query: 


361 


EKLQERLAKLAGGVAVIKVGAATETELimoa^RIEDAIiNATRAAVEEGIVSGGGTALVNV 


420 


50 






EKLQERUmi+GGVAVIKVGAATETELKEMKLRIEDAIJlATRAAVEEGIV+GGGTAL NV 






Sbjct: 


361 


EKLQERLAKLSGGVAVIKVOAATETELKEMKLRIEDfiimTRAAVEEGIVAGGGTAL^ 


420 




Query: 


421 


lEKVAALKUSrGDEETGRNIVLRALEEPWQIAYWAGYEGSVIIERLKQSEIGTGFNAANG 


480 








I A L+L GDE TGRNIVLRALEEPVRQIA+NAG+EGS++I+RLK +E+G GENAA G 




55 


Sbjct: 


421 


IPAEATLELTGDEATGRNIVLRALEEPVRQIAHNAGFEGSIVIDRLKNAELGIGFNAATG 


480 



60 



Query: 
Sbjct: 



481 

481 



EWVDMVTTGIIDPVKVTRSftLQHRASVASLILTTEAVVaNKPEPEAPTAPAMDPSMMGG 539 
EWV+M+ GIIDPVKV+RSALQNAASVASLILTTEAWANKPEP AP APAMDPSMMGG 
EWVtmiDQGIIDPVKVSRSALQNAASVASLILTTEAVVaNKPEPVAP-APAMDPSWIMGG 538 
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A related DNA sequence was identified in S. pyogenes <SEQ ID 623 3> which encodes the amino acid 
sequence <SEQ ID 6234>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1070 (Affirmative) < suco 

bacterial membrane Certainty^O. 0000 (Not Clear) < suco 

bacterial outside Certainty=o. 0000 (Not Clear) < suco 

An aligjunent of the GAS and GBS proteins is shown below. 

Identities = 491/543 (90%) , Positives = 515/543 (94%) , Gaps = 3/543 (0%) 



Query: 


1 


MAKDIKFSADARSAMVRGVDILADWKOTLGPKGROTVLEKAFGSPLITNDGWIAKKIE 


60 






MAKDIKFSADAR+jyWRGVD+IMTVKVTLGPKGRNVVLEKAFGSPLITNDGVTIAKEIE 




Sb j ct : 


3 






Query: 


61 


LEDHFENMGAKLVSEVZ^KTlSnDIAGDGTTTATV^ 


120 






LEDHFEDSMGAKLVSEVASKTNDIAGDGTTTATVLTQAIV EGLKNVTAGANPIGIRRGIE 




Sb j ct : 


63 




122 


Query: 


121 


TAVSAAVEELKEIAQPVSGKEAIAQVAAVSSRSEKVGEYISEMERVGNDGVITIEESRG 


180 






TA + AVE LK lAQPVSGKEAIAQVAAVSSRSEKVSEYISEAMERVGNDGVITIEESRG 




Sb j ct : 


123 






Query: 


181 


METELEWEGMQFDRGYLSQYMVTDNEKMVSELENPYIIiITDKKISNIQEILPLLEEVLK 


240 






METELEWEGMQFDRGYLSQYMVTDNEKMV++LENP+ILITDKK+SNIQ+ILPLLEEVLK 




Sb j ct : 


183 


METELEVWGMQFDRGYLSQYMVTDNEKMVaDLENPPILITDKKVSNIQDILPLLEEVLK 


242 


Query: 


241 


TNRPLL 1 1 ADDVDGEALPTLVLNKIRGTFNWAVKAPGFGDRRKAMLED I AI LTGGTWT 


300 






TNRPLLIIADDVDGEALPTLVLNKIRGTFNWAVKAPGFGDRRKAMLEDIAILTGGTV+T 




Sb j ct : 


243 


TNRPLLIIM)DVDGERLPTLVLNKIRGTEIWVAVKAPGFGDRRKaMI,EDIAILTGGTVIT 


302 


Query: 


301 


EDMLDLKmTMQVLGQSAKVTVDKDSTVIVEGaGDSSaiAKIRVAIIKSQMEA 


360 






EDLGL+LKDATM LGQ+AK+TVDKDSTVIVEG+G S AIANR+A+IKSQ+E TTSDFDR 




Sbjct: 


303 


EDIfiLELKDATMTAIjGQRAKITVDKDSTVIVEGSGSSERIftNRIALlKSQLETTTSDFDR 


362 


Query: 


361 


EKIQERLAKLAGGVAVIKVGAATETELKEMKLRIEDALNATRAAVEEGIVSGGGTALVNV 


420 






EKLQERLAKLAGGVAVIKVGA TET LKEMKI1RIEDALNATRAAVEEGIV+GGGTAL+ V 




Sbjct: 


363 


EKLQERLAKLAGGVAVIKVGAPTETALKEMKIJlIEnaimTRAAVEEGIVaGGGTAL 


422 


Query: 


421 


lEKVAALKLNGDEETGRNIVLRALEEPVRQIAYNAGYEGSVIIERLKQSEIGTGFNAANG 


480 






lEKWAAL+L GD+ TGRNIVLRflLEEPVRQIA NRGYEGSV+I++LK S GTGFNAA G 




Sbjct: 


423 


lEKWiaLELEGDDATGRNIVLRALEEPVRQIAIiNAGYEGSVVIDKLKNSPASTGFNAATG 


482 


Query: 


481 


EWVDMVTTGI IDPVKVTRSALQNAASVASLHiTTEAWANKPEP- -EAPTAPA-MDPSMM 


537 






EWVDM+ TGIIDPVKVTRSALQNAASVASLILTTEAWBNKPEP AP PA MDP MM 




Sbjct: 


433 


EVTOMIK3GIXDPVKOTRSAI<31CUiS\ffiSLILTTEAVVANKPEPATPAPAMPAGMDPGiyiM 


542 


Query: 


538 


GGF 540 








GGF 




Sbjct: 


543 


GGF 545 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2016 

A DNA sequence (GBSx2127) was identified in S.agalactiae <SEQ ID 6235> which encodes the amino 
acid sequence <SEQ ID 623 6>. Analysis of this protein sequence reveals the following: 

Possible site: 40 ■ 
>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 3216 (Affirmative) < suco 

bacterial meinbrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certaiiity=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10247> which encodes amino acid sequence <SEQ ID 
10248> was also identified. 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BaB06113 GB:AP001515 transcriptional regulator (GntR family) 
[Bacillus halodurans] 
Identities = 50/171 (29%) , Positives = 86/171 (50%) , Gaps = 17/171 (9%) 

Query: 21 HVQVYNKlFmiQDGTYSPGMQLPSEPElAGQIiOTSRATriRKSLaLLQEDHLVKNIRGKG 80 

++QV +K+ + ++ G Y G +LPSE EL+ QL VSRATLR++L LL+E+ +V G G 
Sbjct: 10 YLQVIDKLKHDMEAGVYEEGEKLPSEFELSKQLGVSRATLREALRLLEEEGVWRRHGVG 69 

Query: 81 NFIRENSSNLSENGYENRQHPIKTCLTSKITEVELE FRVEVPAEAITASLKQ 132 

P+ ++ L G E +T I ++E +++E + 

Sbjct: 70 TFV— HTKPLPSaGIEELY SVTDMIRHADMEPGTIFLSSYQIEATDDDKRRPQTD 122 

Query: 133 ETPWVIADRWYHTDDGPLAYTLSFIPIELISDAEISLHDTKQLLNFIEEG 183 

+++ +R D P+ Y L +P ELI + S+H+ +L+ +E G 
Sbjct: 123 NLDQriMMIERVRTADGVPIVYCLDKLPAELI--GQHSVHEINSILDHLESG 171 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6237> which encodes the amino acid 
sequence <SEQ ID 6238>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

»> Seems to have no Kf-terminal signal sequence 

Filial Results 

bacterial cytoplasm Certainty=0. 2297 (Affirmative) < suco 

bacterial meinbrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 154/244 (63%) , Positives = 189/244 (77%) 



Query: 


7 


MPKNELNNKUJKLKHVQVYNKIENMIQDGTYSPGMQLPSEPELA^ 


66 






M N+L KL KLKHVQVYN IF H-lQDGTYSPGMQIiPSEPEIiA QMIVSR TLRKSLAL 




Sbjct: 


1 


MSTNDLTKKI.KKLKHVQVYNTIFQLIQDGTYSPGMQLPSEPELaRQIlNVSR^mlRKSL^ 


60 


Query: 


67 


LQEDHLVKNIRGKGNFIRENSSKLSENGYENRQHPIKTCLTSKITEVELEFRVEVPAEAI 


126 






LQEDHL+KNIRGKGNFI + G+E QHPI L+S IT+VELE+R+EVP AI 




Sbjct: 


61 


LQEDHLIKNIRGKGNFILKTPETKYHQGFEYLQHPIYASLSSDITKVELEYRIEVPTVAI 


120 


Query: 


127 


TASLKQETFVWIADRWYHTDDGPIAYTLSFIPIELISEffiEISiaiDTKQIjIlJ 


186- 






TASLKQETPW+I DRWYH+ + +AY+LSPIPIE+IS I+L+ + LL F+EE IY+ 




Sbjct: 


121 


TASLKQETPWIIVDRWYHSQNKAIAYSLSFIPIEVISKYAININQEEPLLTFLEEKIYE 


180 


Query: 


187 


EGISSHSQSHLGYATSGNFSATKYTLSDHGQFILIQETIFKQEKILMCNKHYVPIEHFEL 


246 






G +SHS + +GY +GN++ATKYTLS++ FILIQET++ + IL+ KHYVP + F+L 




Sbjct: 


181 


SGKASHSCMQIGYTKTGim^ATKyrLSENSAFILIQETliYNGKDILVSTKHYVPAm 


240 


Query: 


247 


SITS 250 

+ S 




Sb j ct : 


241 


KVQS 244 





Based on this aaalysis, it was predicted that these proteins and their epitopes could be usefiil antigens for 
vaccines or diagnostics. 
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Example 2017 

A DNA sequence (GBSx2128) was identified in S.agalactiae <SEQ ID 6239> which encodes the amino 
acid sequence <SEQ ID 6240>. This protein is predicted to be purine nucleoside phosphorylase (udp-1). 
Analysis of this protein sequence reveals the following: 

5 Possible site: 47 

»> Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0. 3910 (Affirmative) < suco 

10 bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC65977 GB:AE001270 uridine phosphorylase (udp) [Treponema 
15 pallidum] 

Identities = 145/246 (58%), Positives = 171/246 (68%) 

Query: 11 QYHLQIRPGDVGRWIMPGDPKRCAKIAEHFDmVLVADSREYVTYTGTLNGEKVSVTST 70 
+YH+ ++ D+G YVI+PGDP R KIA+HF + V +REYVTYTGTL VSV ST 
20 Sbjct: 10 EYHIGLKASDIGHYVILPGDPARSEKIAQHFSHPHKVGHNREYVTYTGTLCETPVSVMST 69 

Query: 71 GIGGPSASIAMEELKLCGADTFIRVGTCGGIDLDVKGGDIVIATGAIRMEGTSKEYAPIE 130 

GIGGPS +1 +EEL GA TFIRVGT GG+ D+ G +VIATGAIR EGTSKEYAP+E 
Sbjct: 70 GIGGPSTAIGVEELIHLGAHTFIRVGTSGGMQPDILAGTWIATGAIRFEGTSKEYAPVE 129 

25 

Query: 131 FPAVADLEVTNALWAAKKMYTSHAGWQCKDAFYGQHEPERMPVSYELtNKWEAWKRL 190 
FPAV D VT AL +AA+ + GWQCKD FYGQH P MPV EL KW AW 

■ Sbjct: 130 FPAVPDFTOTAALKHAAEDVQVRHaifiWQCKDNFYGQHSPHTMPVHRELTQKWHAWIAC 189 

30 Query: 191 GTKASE^mSAALFVAASHLGTOCGSDFLWGNQERNAIflMDNPMAHDTEAAIQVAVEALR 250 

T ASEMESARLEV S VR G+ LV+GNQ R A G+++ HDTE AI+VAVEA++ 

■ Sbjct: 190 OTI^SEMESAALFVLGSVRRVRTGAVLLVIGNQTRRAQGLEDIQVHDTENAIRVAVEAVK 249 . 

Query: 251 TLIEND 256 
35 LI D 

Sbjct: 250 LLITQD 255 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6241> which encodes the amino acid 
sequence <SEQ ID 6242>. Analysis of this protein sequence reveals the following: 

40 Possible site: 47 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certaintyi=0. 3910 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 259/259 (100%) , Positives = 259/259 (100%) 

50 

Query: l MQNYSGEVGLQYHLQIRPGDVGRYVIMPGDPKRCAKIAEHFDNAVLVADSREYVTYTGTL 60 

MQNYSGEVGLQYHLQIRPGDVGRYVIMPGDPKRCAKIAEHFDNAVLVADSREYVTYTGTL 
Sbjct: 1 MQNYSGEVGLQYHLQIRPGDVGRYVIMPGDPKRCAKIAEHFDNAVLVftDSREYVTYTGTL 60 

55 Query: 61 NGEKVSVTSTGIGGPSASIAMEELKLCGADTFIRVGTCGGIDLDVKGGDIVIATGAIRME 120 

NGEKVSVTSTGIGGPSASIAMEELKLCGADTFIRVGTCGGIDLDVKGGDIVIATGAIRME 
Sbjct: 61 NGEKVSVTSTGIGGPSASIAMEELKLCGADTFIRVGTCGGIDLDVKGGDIVIATGAIRME 120 

Query: 121 GTSKEYAPIEFPAVADLEVTNALVNAAKKLGyTSHAGWQCKDAFYGQHEPERMPVSYEL 180 
60 GTSKEYAPIEFPAVADLEVTNALVNAAKKLGYTSHAGWQCKDAFYGQHEPERMPVSYEL 

Sbjct: 121 GTSKEYAPIEFPAVftDLEVTNALVNAAKKLGYTSHAGVVQCKDAFYGQHEPERMPVSYEL 180 
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Query: 181 LNKWEAWKRLGTKASEMESA7U:,FVAASHLGVRCGSDFLVVGNQER1IALGMDNPMAHDTEA 240 

IJ!JKWEAWKRLGTKASElffiSAaLFVAASH]:X3TOCGSDFLVVGNQERNftW3^m 
Sbjct: 181 imWEAWKRI/STKASEMESAMiFVaaSHLGTOCGSDFLWGIIIQERNAI^MDNPMRHDTEA 240 

Query: 241 RIQVAVEMiRTniENDKSQ 259 

AIQVAVEALRTLIENDKSQ 
Sbjct: 241 AIQVAVEALRTLIENDKSQ 259 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2018 

A DNA sequence (GBSx2129) was identified in S.agalactiae <SEQ ID 6243> which encodes the amino 
acid sequence <SEQ ID 6244>. This protein is predicted to be nucleoside transporter. Analysis of this 
protein sequence reveals the follovdng: 

Possible site: 25 

»> Seems to have an imcleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0 . 4779 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10245> which encodes amino acid sequence <SEQ ID 
10246> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05165 GB:AP001512 nucleoside transporter [Bacillus halodurans] 
Identities = 160/405 (39%) , Positives = 256/405 (62%) , Gaps = 8/405 (1%) 



Query: 


5 


MQFIYSIIGILLVLGIVYAISFNRKSVSLSLI6KAL1VQFIIALILVRIPLGQQWSWS 


64 






M ++ ++GI++V I +A S NR+++ I L +Q + A+I+++IP GQ ++ ++ 




Sb j Ct : 


1 


MNILWGLLGIVWFLIAFAFSTNRRAIKPRTILGGLAIQLLPAIIVLKIPAGQALLESLT 


60 


Query: 


65 


TGVTKVXNCGQftGLNFVFGSLftDSGAKTGFXFAIQTLGNIVFLSALVSLLYYVGILGFVV 


124 






V +1+ e++FVFG + G+ 6F+FAI L ++P SAL+S+LYY+GI+ FV+ 




Sbjct: 


61 


NWLNIISYANEGIDFVFGGFFEEGSGVGFVFAINVLSWIFFSALISILYYLGIMQFVI 


120 


Query: 


125 


KWIGKGVGKIMKSSEVESFVAVANMFLGQTDSPILVSKYLGRMTDSEIMWLVSGMGSMS 


184 






K IG + ++ +S+ ES A AN+F+GQT++P++V YL +MT SE+ V+ G+ S++ 




Sbjct: 


121 


KIIGGRLSWLLGTSKAESMSAAANIFVGQTEAPLVVKPYLPKMTQSELFAVMTGGLaSVA 


180 


Query: 


185 


VSILGGYIALGIPMEYLLIASTMVPIGSILIAKILLPQTEPVQKI-DDIKMDNKGNNANV 


243 






S+L GY LG+P++YLL AS M +++AK+++P+TE DD K+ + N+ 




Sbjct: 


181 


GSVXiIGYSLLGVPLQYLLAASFMftAPAGLIMAKMIMPETEKTTDAEDDFKLAKDEESTNL 


240 


Query: 


244 


IDAIAEGASTGAQMAFSIGASLIAFVGLVSLINMMLSGLG IRLEQIPSYVFAP 


296 






IDA A GASTG + +1 A L+AFV L++LIN +L +G + LE I YVFAP 




Sbjct: 


241 


IDAAANGASTGLMLVLNIAAMLLAFVALIALINGILGWIGGLFGASQLSLELILGYVFAP 


300 


Query: 


297 


FGFLMGFDHKNILLEGNLLGSKLILNEFVSFQQLGDLIKSLDYRTALVATISLCGFANLS 


356 






F++G L G+ +G ia:.++NEFV++ I++L + +V + +LCGFAN S 
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Sbjct: 301 LAWIGIPWAEALQAGSYIGQKLVVNEFVAYLSFAPEIENLSDKAVMVISFALCGFANFS 360 

Query: 357 SLGICVSGIAVLCPEKRGTLARLVFKMCGGIAVSMLSAFIVGIV 401 
SLGI + G+ L P +R +ARL RA++ G S+LSA I G++ 
5 Sbjct: 361 SLGILLGGLGKLAPSRRPDIARLGLRAILAGTLASLLSASIAGMIi 405 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6245> which encodes the amino acid 
sequence <SEQ ID 6246>. Analysis of this protein sequence reveals the following: 

Possible site: 25 
10 »> Seems to have an uncleavable N-term signal seq 
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20 



Final Results 

bacterial membrane -- 
bacterial outside -- 
bacterial cytoplasm -- 



- Certainty=0 . 4779 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



25 The protein has homology with the following sequences in the databases: 

>GP:BAB05165 GB:AP001512 nucleoside transporter [Bacillus halodurans] 
Identities = 160/405 (39%) , Positives = 257/405 (62%) , Gaps = 8/405 (1%) 

Query: 5 MQFIYSIIGILLVLGIVYAISFNRKSVSLSLIGKALIVQFIIALILVRIPLGQQIVSWS 64 
30 M ++ ++GI++V I +A S NR+++ I L +Q + A+I+++IP GQ ++ ++ 

Sbjct: 1 MNILWGLLGIVWFLIAFAFSTMRRAIKPRTILGGIiAIQLLFAIIVLKIPAGQaLLESLT 60 

Query: 65 TGVTSVINCGQAGLNFVFGSLADSGAKTGFIFAIQTLGNIVFLSALVSLLYYVGILGFW 124 

V ++I+ G++FVFG 4- G+ GF+FAI L ++F SAL+S+LYY+GI+ FV+ 
35 Sbjct: 61 NWLNIISYANEGIDFVFGGFFEEGSGVGFVFAINVLSWIFFSALISILYYLGIMQFVI 120 

Query: 125 KMIGKeVGKIMKSSEVESFVAVANMFLGQTDSPILVSKYLGRMTDSEIMVVLVSGMGSMS 184 

K IG + ++ +S+ ES A AN+F+GQT++P++V YL +MT SE+ V+ G+ S++ 
Sbjct: 121 KIIGGALSWLLGTSKftESMSAAaNIFVGQTEAPLWKPYLPKMTQSELFAVMTGGLASVa 180 

40 

Query: 185 VSILGGYIALGIPMEYLLIASTMVPIGSILIAKILLPQTEPVQKI-DDIKMDNKGNNANV 243 

S+L GY LG+P++YLL AS M +++AK+++P+TE DD K+ + N+ 

Sbjct: 181 GS^7LIGYSLLGOTLQYLLAASFMAAPAeLIMfiIMIMPETEKTTnREDDFKL^^ 240 

45 Query: 244 IDAIAEGASTGAQMAFSIGASLIAFVGLVSLINMMLSGLG IRLEQIFSYVFAP 296 

IDA A GASTG + +1 A L+AFV L++LIN +L +G + LE I YVFAP 

Sbjct: 241 IDAftANGASTGLMLVUIIAAMLLAFVALIALINGILGWIGGLFGASQLSLELILGYVFAP 300 

Query: 297 FGFLMGFDHKNILLEGNLLGSKLILNEFVSFQQLGHLIKSLDYRTALVATISLCGFANLS 356 

50 F++G L G+ +G KL++NEFV++ I++L + +V + +LCGFAN S 

Sbjct: 301 LAFVIGIPWAEALQAGSYIGQKLWNEFVAYLSFAPEIENLSDKAVMVISFALCGFANFS 360 

Query: 357 SLGICVSGIAVLCPERRSTLARLVFRAMIGGIAVSMLSAFIVGIV 401 
SLGI + G+ L P +R +ARL RA++ G S+LSA I G++ 
55 Sbjct: 361 SLGILLGGLGKLAPSRRPDIARLGLRAILAGTLASLLSASIAGML 405 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 399/404 (98%) , Positives = 401/404 (98%) 

60 Query: 1 MEVIMQFIYSIIGILLVLGIVYAISENRKSVSLSLIGKALIVQPIIALILVRIPLGQQW 60 

+EVIMQFIYSIIGILLVLGIVYAISFNRKSVSLSLIGKALIVQFIIALILVRIPLGQQ+V 
Sbjct: 1 LEVIMQFIYSIIGILLVLGIVYAISFNRKSVSLSLIGKALIVQFIIALILVRIPLGQQIV 60 



Query: 61 SOTSTGVTKVINCGQAGLNFVFGSLRDSGRKTGFIFAIQTLGNIVFLSALVSLLYYVGIL 120 
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SWSTGVT VINCGQft.GLNFVFGSLaDSGaKTGFIFAIQTLGNIVFLSALVSLLYYVGIL 
Sbjct: 61 SWSTGVTSVINCGQAGLNFVFGSLADSGaKTGFIFAIQTLGNIVFLSMiVSLLYYVGIL 120 

Query: 121 GFVVKKIGK!GVGKIMKSSEVESFVAVAlmFLGQTDSPILVSKYLGRMTDSEI^mLVSG 180 

GFVVKWIGRGVGKIMKSSEVESFVa\«amFIJ3QTDSPILVSKYK3RMro^ 
Sbjct: 121 GFVVKWIGKGVGKIMKSSEWSFV?Wffi[mPI<3QTDSPILVSKYLGRMTDSEIM^ 180 



Query: 181 GSMSVSILGGYIALGIPMEYLLIASTMVPIGSILIAKILLPQTEPVQKIDDIKMDNKGNN 240 

GSMSVSILGGYIALGIPMEYLLIASTMVPIGSILIAKILLPQTEPVQKIDDIKMDNKGNN 
Sbjct: 181 GSMSVSIIK3GYIALGIPMEYIjLIASTMVPIGSILIAKILLPQTEPVQKIDDIKMDNKGNN 240 

Query: 241 AOTIDAIAEGASTGAQMAFSIGASLIAFWSLVSLINMMLSGLGIRLEQIFSYVFAPFGFL 300 

ANVIDAIAEGASTQftQMAFSIGASLIAFVGLVSLISMMLSGLGIRLEQIFSYVFAPFGFL 
Sbjct: 241 ANVIDAIAEGASTGAQMRFSIGASLIAFVGLVSLINMMLSGLGIRLEQIFSYVFAPFGFL 300 

Query: 301 MGFDHKNILLEGNLLGSKLILNEFVSFQQLGDLIKSLDYRTALVATISLCGFANLSSLGI 360 

MGFDHKNILLEGNLLGSKLILNEFVSFQQLG LIKSLDYRTALVATISLCGFANLSSLGI 
Sbjct: 301 MGFDHKNILLEG^^iGSKllIrJ!ffiFVSFQQIflHLIKSIlDYRTALVATISLCGFAraJSSI^^ 360 

Query: 361 CVSGIAVLCPEKRGTLARLVFRAMIGGIAVSMLSAFIVGIVTLF 404 

CVSGIAVLCPEKR TLARLVFRAMIGGIAVSMLSAFIVGIVTLF 
Sbjct: 361 CVSGIAVLCPEKRSTLftRLVFRftMIGGIAVSMLSAPIVGIVTLF 404 

A related GBS gene <SEQ ID 8955> and protein <SEQ JD 8956> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 1 
McG: Discrim Score: 13.83 
GvH: Signal Score (-7.5): -2.63 
Possible site: 25 

»> Seems to have an uncleavable N-term signal seq 
ALOM program count: 8 value: -9.45 threshold: 0.0 
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modified ALOM score: 2.39 



*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 4 77 9 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

ORF01522(313 - 1512 of 1812) 

GP|9656920|gb|AAF95495.l| |AE004305(1 - 418 of 418) NupC family protein {Vibrio cholerae} 

%Match =24.0 

%Identity =39.5 %Similarity =65.7 

Matches = 160 Mismatches = 134 Conservative Sub.s = 106 



276 306 336 366 396 426 456 486 

C*STPHTY*K**ITISEVIiEVIMQFIYSIIGILLVLGIVyAISFNRKSVSLSLIGKALIVQFIIALILVRIPLGQQWSV 

I : hlh ::||| :| llh-l =1 h =11 : == :| ||::: 
MSLFMSLIGMAVLLGIAVLLSSNRKAINLRTVGGAFAIQFSLGAFILYVPWGQELLRG 
10 20 30 40 50 



516 546 591 621 651 681 711 

VSTGVTKVINCGQAGLNFVFGSLADSG AKTGFIFAIQTLGNIVFLSALVSLLYYVGILGFWKWIGKGVGKIMKS 



wo 02/34771 



PCT/GBOl/04789 



-2273- 

FSDAVSlWIISnfGmGTSFLFGGLVSGKMFEVFGGGGFIFAFRVLPTLlFFSALISVLYYLGVMQWIRILGGGLQKALGT 
70 80 90 100 110 120 130 

741 771 801 831 861 891 921 951 

5 SEVESFVAVAI*lFLGQTDSPILVSKYLGRMTDSEIMVVLVSGMGSMSVSILGGYIMK3IPMErinLLIAS™VPIGS 

I II I ||:|:|||::|::| :: :|l ||: 1= 1= | = : :| || :|ll= III I =1 II 

SRAESMSaAMIFVGQTEAPLVTOPFVPKMTQSELFAVM03GI^IflGGVIJ«3YASMGVKIEyLVAASF^^ 
150 160 170 180 190 200 210 

10 981 1011 1038 1068 1098 1128 1167 

ILLPQTEPVQKIDDIKMDNKGNN-ANVIDAIAEGASTGAQMAFSIGASLIAFVGLVSLINMMLSGLG IRLEQI 

:::|:|| I =11 :| : llllll I III I |:|:::|l lllhlhHIl II hi ::|l = 

LMMPETEKPQDlffiDITLIXMDDKPANVIDAaaGGASAGLQIJffiK^^ 

230 240 250 260 270 280 290 

15 

1197 1227 1257 1287 1305 1332 1362 1392 

FSYVFAPFGFLMGFDHKNILLEGNLLGSKLILNEFVSFQQ LGDLIKS-LDYRTALVATISLCGFANLSSLGICVSG 

: =:|l|: 11=1 = I -M = 1111== I I = I =1 = : =111111111: I = I 

LGWLFAPLAFLIGVPWNEATVAGEFIGLKTVftNEF^aySQFAPYLTEAAPVVLSEKTKAIISFAL 
20 310 320 330 340 350 360 370 

1422 1452 1482 1512 1542 1572 1602 1632 

IAVLCPEKRGTI^LVFRAMIGGIAVSMLSAFIVGIVTLF*KLTKERRIVTWK*KIF*KR*TILC*QC3QQHGQKSKQP*M 

, : I |:=ll :|h =1=1 I ::==l III 
25 LGSLAPKRRGDIARMGVKAVIAGTLSNLMAATIflGPFLSF 
390 400 410 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

30 Example 2019 

A DNA sequence (GBSx2130) was identified in S.agalactiae <SEQ ID 6247> which encodes the amino 
acid sequence <SEQ ID 6248>. This protein is predicted to be deoxyribose-phosphate aldolase (deoC). 
Analysis of this protein sequence reveals the following: 

Possible site: 49 
35 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certairity=0 .2196 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>6P:CAA81646 GB:Z27121 deoxyribose aldolase [Mycoplasma hominis] 
Identities = 99/199 (49%) , Positives = 140/199 (69%) , Gaps = 1/199 (0%) 

45 





Query: 


5 


DILKITOHTLIATTATWPEIQTILDnamYETASACIPASYVKKAaEYVSGK-LAICTVI 


63 








++ K +DHT L+ +AT +1 ++ +A+ y+ S CI SYVK A E + + +CTVI 






Sbj ct: 


3 


EliNKyiDHTlSmSPSATSKDIDKLIQEAIKYDFKSVCIAPSYVKYAKEALKNSDVLVCTVI 


62 


50 


Query: 


64 


GFPNGYSTTAAKVFECQDAIKNGADEIDMVINLTDVKNGDFDTVEEEIRQIKAACQDHIL 


123 








GFP Gy+ T+ KV+E + A+++GaDEIDMVIN+ K+G ++ V EI+ IK AC L 






Sbjct: 


63 


GPPLGyNATSVKVYETKXRVEHGADEIDMVINVGRFKDGQyEYVIiNEIKM 


122 




Query: 


124 


KVIVETCQLTKEELIELCGWTRSGADFIKTSTGFSTAGATFEDVEVMAKYVGEGVKIKA 183 


55 






KVIVET LTK ELI++ +V +SGADFIKTSTGFS GA+FED++ M + G+ + IKA 






Sbjct: 


123 


KVIVETALLTKAELIKITELVMQSGADFIKTSTGFSYRGASFEDIQTMKETCGDKLLIKA 182 




Query: 


184 


AGGISSLEDAEKFIALGAS 202 










+GGI +L DA++ I LGA+ 




60 


Sbjct: 


183 


SGGIKNLADAQEMIRLGAN 201 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 6249> which encodes tbe amino acid 
sequence <SEQ ID 625 0>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2196 (Affirmative) < suoo 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 211/223 (94%) , Positives = 217/223 (96%) 

Query: 1 MEVKDILKTVDHTLLATTATWPEIQTILDDAMAYETASACIPASYVKKAAEYVSGKLAIC 50 
15 +EVKDILKTVDHTLLATTATWPEIQTILDDAMAYETASACIPASYVKKAAEYVSGKI1AIC 

Sbjct: 1 VEVKDILKTVDHTLiaiTTATWPEIQTILDDflMAYETASACIPASYVKKaAEYVSGKLAIC 60 

Query: 61 TVIGFPNGYSTTAAKVEECQDRIKNGRDEIDMVINLTDVKNGDFDTVEEEIRQIKRACQD 120 
TVIGFPNGYSTTAAKVFECQDAI+NGaDEIDMVINLTDVKNGDFDTVEEEIRQIKA CQD 
20 Sbjct: 61 TVIGFENGYSTTAAKVFECQDAIQNGRDEIDMVINLTDVKNGDPDTVEEEIRQIKAKCQD 120 

Query: 121 HILKVIVETCQLTKEELIELCGWTRSGADFIKTSTGFSTAGATPEDVEVMAKYVGEGVK 180 

HILKVIVETCQIiTKEELIELCGVVTRSGADFIKTSTGFSTAGATFEDVEVMaKYVGEGVK 
Sbjct: 121 HILKVIVETCQLTKEELIELCGVOTRSGftDFIKTSTGFSTi«3ATFEDVEVMAKYVGEGVK 180 

25 

Query: 181 IKAAGGISSLEDAEKFIALGASRLGTSRIIKIVKNQKVEEGTY 223 

IKAaGGISSLEnA+ FIALGRSRLGTSRIIKIVKN+ + +Y 
Sbjct: 181 IKRAGGISSLEDAKTFIALGASRLGTSRIIKIVKNEATKTDSY 223 

30 Based on this anatysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2020 

A DNA sequence (GBSx2131) was identified in S.agalactiae <SEQ ID 6251> which encodes the amino 
acid sequence <SEQ ID 6252>. This protein is predicted to be phosphopentomutase (deoB). Analysis of 
35 this protein sequence reveals the following: 

Possible site: 22 

>» Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0. 0546 (Affirmative) < suco 

bacterial membrane Certainty=o. 0000 (Not Clear) < suco 

bacterial outside Certaintyi=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

45 >GP:AAC45496 GB:U80410 phosphopentomutase [Lactococcus lactis subsp. 

cremoris] 

Identities = 275/408 (67%) , Positives = 325/408 (79%) , Gaps = 7/408 (1%) 

Query: 3 QFDRIHLWLDSVGIGAAPDANDFVNAGVP DGASDTLGHISKTVGLAVPNMAKI 56 

50 +F RIHLW+DSVGIGAAPDA+ F N V D SDT+GHIS+ GL VPN+ K+ 

Sbjct: 4 KFGRIHLVV^roSVGIGAAPDftDKPFlSlHDVETHEAINDVKSDTIGHISEIRGLDVPNLQKL 63 

Query: 57 GiaJIPRPQRLKTOPAEENPSGYATKLQEVSLGKDTMTGHVffilMGLNITEPFDTFVmGF^ 116 
G GNIPR LKT+PA + P+ Y TKL+E+S GKDTMTGHWEIMGLNI PF T+ G+P 
55 Sbjct: 64 GWGNIPRESPLKTIPAAQKPAAYVTKLEEISKGKDTMTGHWEIMGLNIQTPFPTYPEGYP 123 



Query: 117 EDIITKIEDFSGRKVIREANKPYSGTAVIDDFGPRQMETGELIIYTSftDPVLQIAAHEDI 176 
ED++ KIE+FSGRK+IREflNKPYSGTAVI+DFGPRQ+ETGELIIYTSflDPVLQIAftHED+ 
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Sbjct: 124 EDLLEKIEEFSGRKIIREANKPYSGTAVIEDFGPRQLETGELIIYTSADPVLQIAAHEDV 183 

Query: 177 IPLEELYRICEYARSITMERPALL-GRIIARPYVGEPGNFTRTANRHDYAVSPFEDTVLN 235 

I EELY+ICEY RSIT+E ++ GRIIARPYVGE GNF RT R DYA+SPP +TVL 
Sbjct: 184 ISREELYKICEYVRSITLEGSGIMIGRIIARPYVGEacaSTFERTDGRRDYALSPFAETVLE 243 

Query: 236 KLDQAGIDTYAVGKINDIFNGSGINHDMGHNKSNSHGIDTLIKTMGLSEFEKGFSFTNLV 295 

KL +AGIDTY+VGKI+DIFN G+ +DMGHN ++ G+D L+K M +EF +GFSFTNLV 
Sbjct: 244 KLyKftGIDTYSVGKISDIE^^^VGVKYDMGH^ffiIro^ra3VDRLLKft^^"KTO 303 

Query: 296 DFDALYGHRRDPHGYRDCLHEFDERLPEIISAMRDKDLLLITADHGNDPTYAGTDHTREY 355 

DFDA YGHRRD GY + +FD RLPEII AM++ DLL+ITADHGNDP+Y GTDHTREY 
Sbjct: 304 DFDAKYGHRRDVEGYGKAlEDFDGRLPEIIDimEimLIMITRDHGNDPSYVGTraT^ 363 , 

Query: 356 IPLIAYSPSFTGNGLIPVGHFADISATVArJNFGVDTAMIGESFLQDLV 403 

1PL+ +S SF ++PVGHFADISAT+A+HF V A GESFL LV 
Sbjct: 364 IPLVXFSKSFKEPKVLPVGHFADISATIAEKFSVKKAQTGESFLDALV 411 

There is also homology to SEQ ID 2740: 

Identities = 348/402 (86%) , Positives = 374/402 (92%) 

Query: 1 MSQFDRIHLVVLDSVGIGAAPDAOTFVNAGVPIXSASDTI/SHISKWGriAVPNMAKIGLGN 60 

MS+F+RIHLVVLDSVGIGAAPDA+ F NAGV D SDTLGHIS+ GL+VPNMAKIGLGN 
Sbjct: 1 MSKimiHLVVLDSVGIGtfUiPDADKFENAGVaDTDSDTLGHISEAaGLSVPimaKIGLGN 60 

Query: 61 IPRPQRLKWPAEENPSGYATKLQEVSLGKDTNTraHWEIMGIiNITEPFDTFWNGFPEDII 120 

I RP LKTVP E+NP+GY TKL+EVSLGKDTMTGHWRIMGLNITEPFDTFWNGFPE+I + 
Sbjct: 61 ISRPIPLKTVPTEDNPTGYVTKLEEVSLGKDTMTGHWEIMGLNITEPFDTFWNGFPEEIL 120 

Query: 121 TKIEDFSGRKVIREANKPYSGTAVIDDFGPRQMETGELIIYTSADPVLQIAAHEDIIPLE 180 

TKIE+FSGRK+IREANKPYSGTAVIDDFGPRQMETGELI+YTSADPVLQIAAHEDIIP+E 
Sbjct: 121 TKIEEFSGRKlIREftNKPYSGTAVlDDFGPRQMETGEIiIVYTSADPVLQIAflHEDIIPVE 180 

Query: 181 ELYRICEYARSITMERPALLGRIIARPYVGEPGNFTRTANRHDYAVSPFEDTVIiNKLDQA 240 

ELY+ICEYARSIT+ERPALLGRIIARPYVG+PGNFTRTANRHDYAVSPF+DTVIJSI^ A 
Sbjct: 181 ELYKICEYARSITLERPALLGRIIARPYVGDPGNFTRTAimHDYAVSPFQDTVmK^^ 240 

Query: 241 GIDTYAVGKINDIFNGSGINHDMGHNKSNSHGIDTLIKTMGLSEFEKGFSFTNLVDFDAL 300 

G+ TYAVGKINDIENGSGI +DMGHNKSNSHGIDTLIKT+ L EF KGFSFTNLVDFDA 
Sbjct: 241 GVPTYAVGKIiroiENGSGITNDMGHNKSNSHGIDTLIKTLQLPEFTKGFSFra^ 300 

Query: 301 YGHRRDPHGYRDCLHEFDERLPEIISftMRDKDLLLITADHGNDPTYAGTDHTREYIPLLA 360 

+GHRRDP GYRDCLHEFD RLPEII+ M++ DLLLITADHGNDPTYAGTDHTREYIPLLA 
Sbjct: 301 FGHRRDPEGYRDCLHEFDNRLPEIIANMKEDDLLLITADHGNDPTYAGTDHTREYXPLLA 360 

Query: 361 YSPSFTGNGLIPVGHFADISATVADNFGVDTAMIGESFLQDL 402 

YS SFTGNGLIP GHFADISATVA+NFGVDTAMIGESFL h- 
Sbjct: 361 YSVSFTGNGLIPQGHFADISATVAENPGVDTAMIGESFLSHL 402 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2021 

A DNA sequence (GBSx2132) was identified in S.agalactiae <SEQ ID 6253> which encodes the amino 
acid sequence <SEQ ID 6254>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-12.05 Transmembrane 9 - 25 ( 4 - 35) 



Final Results 

bacterial membrane Certainty=0. 5819 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6255> which encodes the amino acid 

sequence <SEQ ID 625 6>. Analysis of this protein sequence reveals the following: 

5 Possible site: 56 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -5.57 Transmembrane 41 - 57 ( 38 - 60) 

Final Results 

10 bacterial membrane Certainty=0 .3230 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9143> which encodes the amino acid sequence 
15 <SEQ ID 9144>. Analysis of this protein sequence reveals the following: 

Possible cleavage site: 49 
»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -5.57 Transmembrane 13 - 29 ( 10 - 32) 

20 Final Results 

bacterial membrane Certainty*: o . 323 (Affirmative) < suco 

bacterial outside Certainty= 0 . 000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0.000 (Not Clear) < suco 

25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 276/544 (50%) , Positives = 368/544 (66%) , Gaps = 5/544 (0%) 

Query: 5 FKKKWKVCLVIFGIVLVSLLSLGFFYFSKGQVLSRFVAARSRTSGQAFDNIKEYMVWSD 64 

F K +K +1 L L G FY+SK ++ ++ MS SG F+NIK Y+W D 
30 Sbjct: 33 FHHKKLKQITIIAATSLFLFLIGGAFYYSKNHCINAYLKARSAQSGPVFENIKAYLVWDD 92 

Query: 65 TGESITNDEANYANFEPLSKSEARKLGQEIKEGNKNDSMYLKRVGSRLGIFPDYRIANKP 124 

T E ITNDEA Y F S+ E R+ Q++K +++ ++ +K VG R IFPDYRIA KP 
Sbjct: 93 TNEQITNDEAMYTKFRRYSQKELRQKKQDLKAASQDSAVQVKSVGRRFWIFPDYRIAIKP 152 
35 , 

Query: 125 MSLTLKTNVPmmiLNQKKVATSNSDHFSVTVERLPRTimftSLEGTSDGKEIKLK^ 184 

M LT+KTNVP+ DVLLNQKKVA S+S+ FSV ++RLP YTAS+ G +G+ IK+ K Y 
Sbjct: 153 mLTIKTNVPQADVLLNQKKVAVSDSEQPSVKLDRLPTAEYTASIRGKHNGRNIKVN^ 212 

40 Query: 185 DGKNQTIDLSVAFKSFTVTSNLMDGNLYFGDNRIAICLJCDGSHSVENYPVTDGSKAYIKKV 244 

DG N +DLSV+F++F VTSN G+LYF DN I LKDG VE+YPVT+ ++AY+K 
Sbjct: 213 DGDNPVLDLSVSFRTFLVTSNAKQGDLYFDDNHIGTLiODGQLQVEDYPVTENAQAYMKTT 272 

Query: 245 FNIX3EITSHKQKLISIADNQTIKLDVDGLLNEKH4GQKLITAFNQLILyVSTGQDPQTLG 304 

45 F.DGE+ S K L + + T+++ V LL E +AG+ L++AF+QL+ Y+STGQD L 

Sbjct: 273 FPDGELRSQKYALADVEEGATLEILVTDLLEEDKAGELLVSAFDQLMHYLSTGQDSSNLR 332 

Query: 305 TVFEK3AENDFYKGLKESIKAKFVTDNRKASHFTIPNIVLNKMTQVGKESYQVNFAADYD 364 
+VFE G+ N FY+GLKESIKAKF TD RKAS IP+I+L MTQVGK +Y ++F A Y+ 
50 Sbjct: 333 SVFEAGSSNAFYRGLKESIKAKFQTDTRKASRLNIPSILLTTMTQVGKTTYVLDFTATYE 392 

Query: 365 FNYDKSTDPDKKIYGHIIQNLTGNFIMKKSGNSYLISNDGKKDITVAKETNKVKADPVSI 424 

F YDKSTDP++ T GHI Q+LTG +KK G YLIS G K+ITV KE N++KA S+ 
Sbjct: 393 FLYDKSTDPEQHTSGHINQDLTGKVTVKKVGQHYLISQSGSKNITVVKEDNQLKAP--SV 450 



55 



Query: 425 FPENLVGSWKGEVEDGTraiTFDKDGKVTQK-KOTKDSKSKESNHSAiCVTKLEDRSNGLY 483 

FPE+++G+W G+ ++ M+ DG +T K + K ++SKE+ +AK++K+KDK3NG Y 
Sbjct: 451 FPESILGTWTGQANGLSIHMSLASDGTITTKVEDQKGNRSKET-RTAKISKVEDKGNGFY 509 



60 Query: 484 LYQYESGTDTTTFV-TGGIGGLIWKYAYGIKIE6NKIIPVIWQTSSDGEFDYHKPLLSKP 542 

LY + G+D + V GG+GG VKYAYG KI G PV+WQ + EFDY KPL 
Sbjct: 510 LYTPDPGSDISALVPEGGLGGANVKYAYGFKISGKTASPWWQAALTHEFDYTKPLSGVT 569 
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Query: 543 LTKQ 546 
L KQ 

Sbjct: 570 LQKQ 573 

A related DNA sequence was identified in S.pyogenes <SEQ ID 9065> which encodes amino acid sequence 
<SEQ ID 9066>. An alignment of the GAS and GBS sequences follows: 
Score = 47.3 bits (110), Expect = 4e-07 

Identities = 65/303 (21%) , Positives = 119/303 (38%) , Gaps = 18/303 (5%) 

Query: 153 FYIMIGTSISIVVaLTRFVKEISLNFKEIKKLAKTKMGIEVLSEI^^ 209 

+YIL + T 1+ +V + +S F +KKL KM + +QI EF D+ 

Sbjct: 37 YYILSV-TIIACIVGGIVNLFLLSSVFTSLKKLKQKMKDISQRCFDTKAQICSPQEFKDL 95 

15 Query: 210 LRTLHIKGDNLKSLIEREILEKQDLSFQIAALSHDIKTPXXXXXXXXXXXXXXXXXXXQE 269 

+ L+S + +++ + lA LSHDIKTP + 

Sbjct: 96 ETAraQMSSELESTFKSLNESEREKTMMIAQLSHDIKTPITSIQSTVEGILDGIISEEEV 155 

Query: 270 GYIVSMNNSlSVFEGYFNSLISyTRML SEDRSVKIiILVEELLSELHFEVDDL 321 

20 Y + N+IS N L+ + +E + I +++LL ++ E + 

Sbjct: 156 NYYL---NTISRQTimiNHLVEELSFITLETMSDTAEPHKEETIYI)DK]:jLIDIL^^ 212 

Query: 322 IiNINNIEFSIC^^iI.IITSFYGDEEI^^:^IRALSNLLVmIRFMPVLDKKIEVILSESGEQIH 381 
N+I ++ +LRL NL+ NA ++ + + + + I 

25 Sbjct: 213 FEKENRQVMIDVAPDVSKLSSQYDKLSRILIOTiISNftXKYSDP-GSPLTIKAYSNRQDIV 271 

Query: 382 FEIWNNGERFSDSTIiKKGDKLFYTEDYSRGNK--HYGIGLAFVKBVAIKHGG[in^R^ 439 

+I+G DL Y+SRK +G+GL + +A + G++ + + 

Sbjct: 272 IDIIDQGYGIKDEDLASIENRLYRVESSKNMKTGGHGLGLYIARQLAHQLNGDILVESQY 331 



30 



Query: 440 RGG 442 
+ G 

Sbjct: 332 QKG 334 



35 A related sequence was also identified in GAS <SEQ ID 9135> which encodes the amino acid sequence 
<SEQ ID 9136>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>» Seems to have a cleavable N-term signal seq. 
40 INTEGRAL Likelihood = -3.56 Transmembrane 145 - 161 ( 145 - 164) 

Final Results 

bacterial membrane Certainty=0. 2423 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45. bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 6254 (GBS280) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 52 (lane 8; MW 63.7kDa). It was also expressed in E.coli as a GST-fiision 
product. SDS-PAGE analysis of total cell extract is shown in Figure 58 (lane 7; MW 88.7kDa). 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useftil antigens for 
vaccines or diagnostics. 

Example 2022 

A DNA sequence (GBSx2133) was identified in S.agalactiae <SEQ ID 6257> which encodes the amino 
acid sequence <SEQ ID 6258>. This protein is predicted to be ribosomal large subunit pseudouridine 
55 synthase D (rluC). Analysis of this protein sequence reveals the following: 
Possible site: 22 
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>>> Seems to have an uncleavable N-term signal seq 

INTEGR2VL Likelihood = -4.62 Transmembrane 2 - 18 ( 1-19) 



Pinal Results 

5 bacterial membrane — Certainty=0. 2848 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < succ> 

bacterial cytoplasm. — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

10 >GP:CAB12749 GB:Z99108 similar to hypothetical proteins [Bacillus siibtilis] 

Identities = 97/251 (38%) , Positives = 147/251 (57%) , Gaps = 15/251 (5%) 

Query: 86 KHVLINNEFINWQTWQENDTITLIFDDEDYPTKKIPLGRAKLIDCLYEDEHLIIVWKPE 145 
+ + +N+E + +V++ D + + +++ G +D L+ED H++I+NKP 

15 Sbjct: 43 QQIKVNHESVIiNl»lIVKKGDRWIDLQESEaSSVIPEYGE---IiDILFEDNHm 99 

Query: 146 GMKTHGNQENEIALMJHVSAy SGQTCYV--VHRLDMETSGAVLFAKNPFILPLINQ 199 

G+ TH N+ + L ++ AY +G+TC V VHRLD +TSGA++FAK+ +++Q 
Sbjct: 100 GIATHP^mDGQTGTLaNLIAYHyQINGETCKVRHVHRIlDQDTSGAIVFAKHRI^^ 159 

20 

Query: 200 RIjERKEIWREYWALVEGKFSPKHQVLRDKIGRNR-HDRRKRIIDSKNGQHAMTIIDVL-- 256 

+LE+K + R Y A+ EGK K + IGR+R H R+R+ S GQ A+T V+ 
Sbjct: 160 QLEKKTLKRTYTAIAEGKLRTKKGTINPPIGRDRSHPTRRRV--SPGGQTAVTHFKVMAS 217 

25 Query: 257 KYIQNSSLIKCRLETGRTHQIRVHLSHHGHPLIGDPLYNPSSN-NERLMLHftHRLTLSHP 315 

+ SL++ LETGRTHQIRVHL+ GHPL GD LY S R LHA+++ HP 
Sbjct: 218 NAKERLSLVEIiELETGRTHQIRVHLASIXSHPLTODSLYGGGSKLIiNRQRLHKHKVQKVHP 277 

Query: 316 LTCETISVEAP 326 
30 +T E I EAP 

Sbjct: 278 ITDELIVREAP 288 

A related DNA sequence was identified in S. pyogenes <SEQ ID 625 9> which encodes the amino acid 

sequence <SEQ ID 6260>. Analysis of this protein sequence reveals the following: 

35 Possible site: 38 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 4198 (Affirmative) < suco 
40 bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty= 0.0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 172/278 (61%) , Positives = 212/278 (75%) , Gaps = 2/278 (0%) 



TVKELLEDYPLIPRKIRHPLRVKKHVLINNEFINWQTVVQENDTITLIFDDEDYPTKKIP 122 

TVK LLE+ LIPRKIRHFLR KKHVLIN +NWQ+ V+ D + L FD EDYP K I 
TVKALLEEQLLIPRKIRHFLRTKICHVLINGHSVNWQSCVKYGDQVKLFFDHEDYPEKIIV 61 



+G+AE + CLYEDEH+IIVNKPEOIKTHGN P E+ALIJSHVSAY+GQTCYWHRLD ETS 
MGQaEKVTCLYEDEHI 1 1 VNKPEGMRTHGNDPTELALIiNHVSAYTGQTCYWHRLDKETS 121 



55 GA+LFAK PFILP++N+ LE+++I REY ALV G IGR+RHDRRKR++D 



Query: 


63 


Sb j ct : 


2 


Query: 


123 


Sbjct: 


62 


Query: 


183 


Sbjct: 


122 


Query: 


243 


Sbjct: 


182 


Query: 


301 


Sbjct: 


242 



NG+ A+T + ++K + + +SL+ C+Ii+TGRTHQIRVHL+H GH L GDPLY N 



RLMLHA++L L HPLT E I V+A S+TF+ +IjN K 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2023 

A DNA sequence (GBSx2134) was identified in S.agalactiae <SEQ ID 6261> which encodes the amino 
acid sequence <SEQ ID 6262>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -9.02 Transmembrane 98 - 114 ( 93 - 119) 

Final Results 

bacterial membrane Certainty=0. 4609 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with tihie following sequences in the GENPEPT database. 

>GP:AAF04735 GB:AF101780 penicillin-binding protein 2a 
[Streptococcus pneumoniae] 
Identities = 424/773 (54%) , Positives = 555/773 (70%) , Gaps = 47/773 (6%) 

Query: 2 KLFDKFIDLFRVDEDNDEMTRKNEQETREETSNLDGEEVYDIDDITRPSKSQYQRGIRHQ 61 

KI,F+KF+ LF+ +ETS L+ + I R S+S 
Sbjct: 5 KLFEKFLSLFK KETSELEDSD STILRRSRS 34 

Query: 62 KENAKSRPEWLQKVDRYLPSPKNPIRRFWRRYRIGKLLFIALMAFILIFGSYLFYLSKTA 121 

DR + PIR+FWRRY + K++ I ++ L+ G YLF ++K+ 
Sbjct: 35 DRKKLAQVGPIRKFWRRYHLTKIILILGLSAGLLVGIYLFAVAKST 80 

Query: 122 TVSDLQSALKTTTTIYDKNKEYAGKLSGQKiGTyVELNAISDHLKMAVIATEDRTFYENNG 181 

V+DLQ+ALKT T I+D+ ++ AG LSGQKGTYVEL IS +L+NAVIATEDR+FY+N+(3 
Sbjct: 81 NVITOLQNRLKTRTLIFDREEKEACSRLSGQKGTYVELTDISKNLQHRVIATEDR^ 140 

Query: 182 VNFKRFFLAVATLGKFGGGSTITQQLAKNAYLSQDQTIKRKAREFFLALELTKKYSKAEI 241 

+N+ RFFI1A+ T G+ GGGSTITQQLAKNAYLSQDQT++RKA+EFFLALEL+KKySK +1 
Sbjct: 141 INYGRFFLAIVTAGRSGGGSTITQQLAKmYLSQDQTVERKAKEFFLALELSKKYSKEQI 200 

Query: 242 LTMYLNNSYFGNGVWGVEDASRKYFGTSAANLTVDEAATLAGMLKGPEVYNPYYSVENAT 301 

LTMYLNN+YFGNGVWGVEDAS+KYFG SA+ +++D+AATIiAGMLKGPE+'XIlP SVE++T 
Sbjct: 201 LT^^^L^Ili[AYFGNGVWGVEDASKKYFGVSASEVSLDQaATIAG^fl:^K!GPELyNPI^^ 260 

Query: 302 NRRDTVIAAMVDAGKI.TKSQAKEAASIGMKNRIiADTYflGKIl!ro 361 

NRRDTVL MV AG + K+Q EAA + M ++L D Y GKI+DYRYPSYFDAVVNEA+ 
Sbjct: 261 NRRDTVLQNMVAAGYIDKNQETEAAEVDMTSQLHDKYEGKISDYRYPSYFDAWNEAVSK 320 

Query: 362 YGISEKDIVlSraGYKIYTALDQNYQSGMQKTFDDTSLFPVSDYDGQSAQQASVALDPICrGG 421 

Y ++E++IVNNGY+IYT LDQNYQ+ MQ +++TSLFP ++ DG AQ SVAL+PKTGG 
Sbjct: 321 YNLTEEEIVNNGYRIYTELDQtWQKNMQIVYEMTSLFPRRE-TCTFAQSGSV^ 379 

Query: 422 VRGLVGRVQSTKDAQFRSFNYATQSKRSPASTIKPLWYSPAIASGWSIDKELPNKVQDF 481 

VRG+VG+V FR+F3SIYATQSKRSP STIKPLWY+PA+ +GW+++K+L N + 

Sbjct: 380 VRGVVGQVADNDKTGFRNFKnfATQSKRSPGSTIKPLVVYTPAVEaGWAIiNKQI^ 439 

Query: 482 HGYKPSNYGGIET-ESIPMYQAUmSYNIPAVYTLDKLGINKaFTYGRKPGLNMSSANKE 540 

YK NY GI+T +PMYQ+LA S N+PAV T++ LG++KAF G KPGIiNM ++ 
Sbjct: 440 DSYKVDNYAGIKTSREVPMYQSIAESIiNLPAVATVNDIflVDKAFEAGEKFGIiJMEKVDRV 499 

Query: 541 LGVALGGSVTTNPLEMAQAYSTFAITOGIMHRAHLITRIETANGKLVKQFTDKPKRVISRS 600 

LGVALG V TNPL+MAQAY+ FAN+G+M AH I+RIE A+G+++ + KRVI +S 
Sbjct: 500 LGVALGSGVETNPLQMAQAYAAPANEGLMPEAHFISRIENASGQVIASHKNSQKRVIDKS 559 



Query: 601 VASKOTSMMLGTFSNGTAINANVYGYTMAGKTGTTETDFNPNLSGDQWVVGyTPDVVISQ 660 
VA KMTSMMLGTF+NGT I+++ Y MAGKTGTTE ENP + DQWV+GYTPDWIS 
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Sbjct: 560 VADKMTSMMLGTFTNGTGISSSPADYVMAGKTGTTEAVFNPEYTSDQWVIGYTPDWISH 619 

Query: 661 WVGFKNTDKHHYLTDSSAGTASNIFSTQASYILPYTKGSSFTHIEmYFQNGIGSVYNAQ 720 

W+GP TD++HYL S++ A+++F A+ ILPYT GS+PT +ENaY QNGI + 
Sbjct: 620 WLGFPTTDENHYLAGSTSNGAAHWRNiaNTILPYTPGSTFT-VEmYKOTGIAPAOT 678 

Query: 721 DASNTTNQESRSIINDLKDSASKAAQDISRAVEDSNFQEKVKDAWNSLKDYFR 773 

N ++ ++D++ A + SRA+ D+ +EK + W+S+ + FR 

Sbjct: 679 QVQTNDNSCSTDDNLSDIRGRAQSLVDEaSRAISDAKIKEKAQTIWDSIVNLFR 731 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6263> which encodes the amino acid 
sequence <SEQ ID 6264>. Analysis of this protein sequence reveals the following: 
Possible site: 52 

15 »> Seems to have no N-terndnal signal sequence 

INTEGRAL Likelihood = -7.96 Transmembrane 104 - 120 ( 99 - 124) 

Final Results 

bacterial membrane Certainty=0. 4185 (Affirmative) < suco 

20 bacterial outside — Certainty=0.0000(Not Clear) < suco 

bacterial cytoplasm Certainty4=0.000D(Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF04735 6B:AF101780 penicillin-binding protein 2a [Streptococcus pneumoniae] 
25 Identities = 414/730 (56%) , Positives = 539/730 (73%) , Gaps = 17/730 (2%) 

Query: 50 TKNSEQDPATALQRSFAYEGSPKSRPAWLQKLEAVLPSPQRPIRRFWRRYHIGKLLMILI 109 

T E +T L+RSR+ +KL V PIR+FWRRYH+ K+++IL 

Sbjct: 18 TSELEDSDSTILRRSRSDR KKLAQV GPIRKFWRRYHLTKIILIL6 62 

30 

Query: 110 GTLVLLLGSYLFYLSKTAKVSDLQnaLKATTVIYDHKBKYAGSLSGQKGSYVELNAISDD 169 

+ LL+G YLF ++K+ V+DLQ+ALK T+I+D + + AG+LSGQKG+YVEL IS + 
Sbjct: 63 LSAGLLVGIYLFAVRKSTNVNDLQNALKTRTLIFDREEKEAGALSGQKGTYVELTDISKN 122 

35 Query: 170 LENAVIATEDRTFYSNSGINLKRFLLAWTAGRFGGGSTITQQLAKNAYLSQDQTIKRKA 229 

L+NAVIATEDR+FY N GIN RF LA+VTAGR GGGSTITQQLAKNAyLSQDQT++RKA 
Sbjct: 123 LQNAVIATEDRSFYKNDGINYGRFFLAIVTAGRSGGGSTITQQLAKNAYLSQDQTVERKA 182 

Query: 230 REFFLALELTKKYSKKDILTMYLNNSYFGNGVWGVEDASQKYFGTTAANLTLDEAATLAG 289 
40 +EFFIiALEL+KKySK+ ILTMYLNN+YFGNGVWGVEDAS+KYFG +A+ ++LD+AATLAG 

Sbjct: 183 KEFPLALELSKKYSKEQILTMYLNNAYFGNGVWGVEDASKKYFGVSASEVSLDQAATLAG 242 

Query: 290 MLKGPEIYNPYHSLKNATHRRDTVLGAMVDAKKITQTKAQQARAVGLJCNRLADTYVGKTD 349 
MLKGPE+YNP +S++++T+RRDTVL MVA I + + +A V + ++L D Y GK 
45 Sbjct: 243 mKGPELYNPIWSVEDSTNRRDTTOiQNMVAAGYIDKNQETEAAEVDMTSQLHDKYEGKIS 302 

Query: 350 DYKYPSYFDAVISEAIATYGLSEKDIVNNGYK7YTELDQNYQTGMQTTENNDELFPVSAY 409 

DY+YPSYFDaV++EA++ Y L+E++IVNNGY++YTELDQNYQ MQ + N LFP A 
Sbjct: 303 DYRYPSYFmVVNEAVSKYlttTEEEIVimGYRIYTELDQtrrQAMQIVYEOTSLFP-RA^ 361 

50 

Query: 410 DGSSAQAASVALDPKTGGVRGLIGRVNSSENPTFRSFNYATQAKRSPASTIKPLWYAPA 469 

DG+ AQ+ SVAL+PKTGGVRG++G+V ++ FR+FNYATQ+KRSP STIKPLWY PA 
Sbjct: 362 DGTFAQSGSVALEPKTGGVRGWGQVaDNDKTGFRNFNYATQSKRSPGSTIKPLVVYTPA 421 

55 Query: 470 VASGWSIEKELPiNTVQDFDGYQPHNY-GNYESEDVPMYQALaNSYNIPAVSTLNDIGIDK 528 

V +GW++ K+L N +D Y+ NY G S +VPMyQ+LA S N+PAV+T+ND+G+DK 
Sbjct: 422 VEAGWAimQLimTMQSDSYKVDNYAGIKTSREVPMYQSLKESLI&PAVATVNDI^ 481 

Query: 529 AFTYGKTFGLDMSSAKiCELGVALGGSVTTNPLEMAQAYAAFflNNGVIHPAHLINRIENAR 588 
60 AF G+ FGL+M + LGVALG V TNPL+MAQAYAAFAN G++ AH I+RIENA 

Sbjct: 482 AFEAGEKFGLNMEKVDRVIXSVALGSGVETNPLQMAQAYAAFANEGLMPEAHFISRIENAS 541 

Query: 589 GEVLKTFTDKftKRWSQSVADKOTAMMLGTFSNGTAVNANVYGYTIAGKTGTTETNFNPD 648 
G+V+ + + KRV+ +SVADKMT+MMLGTF+NGT ++++ Y +AGKTGTTE FNP+ 
65 Sbjct: 542 GQVIASHKNSQKRVIDKSVADKMTS^mLGTF'^SrGTGISSSPADYVMAGiCTGTTEAVFNPE 601 
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Query: 649 LAGDQWVIGYTPDWISQWVGFNQTDENHYLTDSSAGTASAIFSTQASYILPYTKGSQFH 708 

DQWVIGYTPDWIS W+GF TDEINHYL S++ A+ +F A+ IIiPYT GS F 
Sbjct: 602 YTSDQWVIGYTPDWISHWLGFPTTDENHYLaGSTSNGaAHVPRNIANTILPYTPGSTFT 661 

5 

Query: 709 VDmYAQNGISAVyGVNETGNQSGVDTQSIIDGLRKSAQEASQSLSKAVDQSGLRDKRQS 768 

V+NAY QNG1+ + T + +R AQ S+A+ + +++KAQ+ 

• Sbjct: 662 VENAYKQNGIAPANTKRQVQTNDNSQTDDNLSDIRGRAQSLVDEASRAISDAKIKEKAQT 721 

10 Query: 769 IWKEIVDYFR 778 

IW IV+ FR 
Sbjct: 722 IWDSIVNLFR 731 

An alignment of the GAS and GBS proteins is shown below. 

15 Identities = 530/715 (74%) , Positives = 623/715 (87%) , Gaps = 1/715 (0%) 

Query: 59 RHQKENAKSRPEmQKVDRYLPSPKNPIRRFWRRYRIGKLLFIALMAFILIPGSYLPYLS 118 

R + + KSRP WLQK++ LPSP+ PIRRFWRRY IGKLL I + +L+ GSYLFYLS 
Sbjct: 65 RAYEGSPKSREAWLQKLEAVLPSPQRPIRRFWRRYHIGKLLMILIGTLVLIiLGSYLFYLS 124 

20 

Query: 119 KTATVSDLQSALKTTTTIYDKNKKYAGKLSGQKGTYVEmAISDHLKNAVIATEDRTFYE 178 

KTA VSDLQ ALK TT I YD EYAG LSGQKG+YVELNAISD L+NAVIATEDRTFY 
Sbjct: 125 KTAKVSDLQDALKATTVIYDHKGEYAGSLSGQKGSYVELNAISDDLENAVIATEDRTFYS 184 

25 Query: 179 NI«3VNFKRFFIAVATLGKPGGGSTITQQUUa!5AYLSQDQTIKRKaREFFI^ 238 

N+G+N KRF LAV T G+FGGGSTITQQLAKNAYLSQDQTIKRKAREFFLaiiELTKKySK 
Sbjct: 185 NSGIlSn^KRFIiLAVVTAGRFGGGSTITQQLApiAYLSQDQTIKRKaREFFr>ALELTKKYS 244 

Query: 239 AEILTMYLNNSYFGNGVWGVEDASRKYFGTSAANLTTOEAATLAGMLKGPEVYNPYYSVE 298 
30 +ILTMYLNNSYFGNGVWGVEDAS+KYFGT+AANLT+DEAATLAGMI.KGPE+YHPY+S++ 

Sbjct: 245 KDILTMYLNNSYFGNGVWGVEDRSQKYFGTTAaNLTLDEAATLAGMLKGPEIYNP^ 304 

Query: 299 NATORRDTVLAaMVDaGKLTKSQAKEAASIGMKNRLAD^^ 358 
NAT+RRDTVL AMVDA K+T+++A++A ++G+KNRLADTY GK +DY+YPSYFnA.V++EA 
35 Sbjct: 305 NATHRRDTVLGaMVDAKKITQTKAQQARAVGLKNRLADTYVGKTDDYKYPSYro^ 364 

Queiy: 359 IDTYGISEKDIVNNGYKIYTALDQNYQSGMQKTFDDTSLFPVSDYDGQSAQGASVALDPK 418 

I TYG+SEKDIVMSIGYK+yT LDQNYQ+GMQ TF++ LFPVS YDG SAQ ASVALDPK 
Sbjct: 365 lATYGLSEKDIVNNGYKVYTELDQNYQTGMQTTFNNDELFPVSAYDGSSAQAASVALDPK 424 

40 

Query: 419 TGGVRGLVGRVQSTKDAQFRSFNYATQSKRSPASTIKPLVVYSPAIASGWSIDKELPNKV 478 

TGGVRGL+GRV S+++ FRSFNYATQ+KRSPASTIKPLWY+PA+ASGWSI+KELPN V 
Sbjct: 425 TGGVR6LIGRVNSSENPTFRSFim.TQAKRSPASTIKPLVVYAPAVaSGWSIEKELPNTV 484 

45 Query: 479 QDFHGYKPSNYGGIETESIPMYQALANSYNIPAVYTLDKLGINKAFTYGRKFGLNMSSAN 538 

QDF GY+P NYG E+E +PMYQAIjAlISYNIPAV TL+ +GI+KAFTYG+ FGL+MSSA 
Sbjct: 485 QDFDGYQPHNYGNYESEDVPMYQAIANSYNIPAVSTltroiGIDKAFTYGKT 544 

Query: 539 KEIKWALGGSVTTNPLEMAQAYSTFANIXSIMHRflHLITRIETANGKLVKQFTO 598 
50 KELGVRLGGSVTTNI'LEMAQAY+ FAN+G++H AHIiI RIE A G+++K FTDK KRV+S 

Sbjct: 545 KELGVAI^SVTTNPLEI^QAYAAFAlWrGVIHPAHLINRIENaRGEVLKTFTDKAKRVVS 604 

Query: 599 RSVASKMTSMMLGTFSNGTAINANVYGYTMAGKTGTTETDFNPNLSGDQWWGYTPDWI 658 
+SVA KMT+MMLGTFSNGTA+NfiNVYGYT+AGKTGTTET+FNP+L+GDQW+GYTPDWI 
55 Sbjct: 605 QS\MKMTAI#(II/3TFSNGTAVNANVYGYTLAGKTGTTETNENPDLAGDQW 664 

Query: 659 SQWVGFraraJKHHYLTDSSAGTASNIFSTQASYILPYTKGSSFTHIENAYFQNGIGSVYN 718 

SQVJVGF TD++HYLTDSSAC3TRS IFSTQASYILPYTKGS F H++NAY QNGI +VY 
Sbjct: 665 SQWVGENQTDENHYLTDSSAGTASAIFSTQRSYILPYTKGSQF-HVDNAYAQNGISAVYG 723 



60 



Query: 719 AQDASNTXNQESRSIINDLKDSASKAAQDISRAVEDSNFQEKVKDAWNSLKDYFR 773 

+ N + +++SII+ L+ SA +A+Q +S+AV+ S ++K + W + DYFR 
Sbjct: 724 VNETGNQSGVDTQSIIDGLRKSAQEASQSLSKAVDQSGLRDKAQSIWKEIVDYFR 778 



65 SEQ ID 6262 (GBS397d) was expressed in E.coli as a His-fiasion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 153 (lane 13; MW 76kDa) and in Figure 184 (lane 9; MW 76kDa). 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2024 

A DNA sequence (GBSx2135) was identified in S.agalactiae <SEQ ID 6265> which encodes the amino 
acid sequence <SEQ ID 6266>. This protein is predicted to be M-like protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 27 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.56 Transmembrane 609 - 625 ( 599 - 628) 
INTEGRAL Likelihood = -0.00 Transmembrane 19 - 35 ( 19 - 35) 



Final Results 

bacterial membrane Certainty=0. 5225 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CRB91647 GB:AJ130830 cell wall protein, putative [Zea mays] 
Identities = 106/182 (58%) , Positives = 123/182 (67%) , Gaps = 8/182 (4%) 

Query: 396 KEDKKPDVKPEAKPEAK--PDVKPEAKPDVKPEAKPIJVKPEaKPDVKPEAKPDV--KPEA 451 

K + KP+ KPE KPE K P KPE KP+ KPE KP+ KPE KP KPE KP+ KPE 
Sbjct: 116 KPEPKPEPKPEPKPEPKIKPKPKPEPKPEPKPEHKPEPKPEPKPKPKPEPKPEPQPKPEP 175 ■ 

Query: 452 KPDVKPKAKPDVKPEA.--KPDVKPDVKPDVKPEA.--KPEDKPDVKPDVKPE2UCPDVKPEA 507 

KP+ KP+ KP+ KPE KP+ KP+ KP+ KPE KPE KP+ KP+ KPE KP+ KPE 
Sbjct: 176 KPEPKPEPKPEPKPEPQPKPEPKPEPKPEPKPEPQPKPEPKPEPKPEPKPEPKPEPKPEP 235 

Query: 508 KPEAKPEAKPEAKPEAKPEAKPDVKPEAKPDVKPEAKPEAKPEAKSEAKPEAKLEAKPEA 567 

KPE KPE +PE KPE KPE KP P+ +P KPE KPE KPE K E KPE K E KPE 
Sbjct: 236 KPEPKPEPRPEPKPEPKPEPKPKPDPKPEPQPKPEPKPEPKPEPKPEPKPEPKPEPKPEP 295 

Query: 568 KP 569 
KP 

Sbjct: 296 KP 297 

There is also homology to SEQ ID 822. 

A related GBS gene <SEQ ID 8957> and protein <SEQ ID 8958> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 

McG: Discrim Score: -5.20 

GvH: Signal Score (-7.5): 3.07 
Possible site: 27 

»> Seems to have no N-terrainal signal sequence 

ALOM program count: 2 value: -10.56 threshold: 0.0 

INTEGRAL Likelihood ^=-10. 56 Transmembrane 609 - 625 .( 599 - 628) 
INTEGRAL Likelihood = -0.00 Transmembrane 19 - 35 ( 19 - 35) 
PERIPHERAL Likelihood = 8.54 139 
modified ALOM score: 2.61 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 5225 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not clear) < suco 

bacterial cytoplasm Certainty=0.0000(Not Clear) < suco 



LPXTG motif: 596-600 
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The protein has homology wilh the following sequences in the databases: 

ORF00748(313 - 2190 of, 2490) 

GP|2462785|gb|AAB71985.l| |073163(3 - 374 of 374) M-like protein {Streptococcus egui} 
5 %Match =9.2 

%Identity = 36.0 %Similarity = 55.4 

Matches = 126 Mismatches = 147 Conservative Sub.s = 68 



282 312 342 372 402 432 462 666 
10 LS**IRIFN*LYKGftNMIlMlTOKiamyFLRKTAYGLASMSAAF THA 

:|::h:|||:|:||lhllh I = I Ih 

MAKKEMKFYLRKSAFGLASVSAftLLVGAARVSADS - - 

10 20 30 

15 696 726 756 786 813 843 870 900 

KVSDQELGKQSRRSQDIIKSIiGFIiSSDQKDIIArKSISSSK-DSQLILKFVTQATQLNNRESTKAK-QMR^ 

::| I :: | | :::| :: h I = Ih II hi = , 

VESAGPTOVAVTDSI^SEAAATKaEaDLVaAKaDLftAREVaiTAaKaEFDTftQaDl^TA^ 

40 50 60 70 80 90 

20 

921 951 981 1011 1041 1071 1101 1131 

PEV- - -LEEYKEKIQRASTKSQVDEFVAEAKKVVNSNKETIiWQMGKKQEIAIOjENLSNDEMIiRYimiir^^ 
h = I =:|ll I I : : hi : ::: I : ::| I I : : |:| : | 

AELEQKIPELEKKIQEAQEKLiraaTOPS-PKRVGSDDEDDTVaRKLMSEKEALKaE LQKTKEftLDTAKRAYAGI 

25 110 120 130 140 150 160 170 



1161 1191 1221 1251 1281 1311 1538 1668 

KliNITAAMNAIiNSIKQAAQEVAQKNLQKQYAKKIERISSKGLALSKKAKEIYEKHKSILPTP AKPDVKPEAKPDVK 

I h: :| I :h I I : || 

30 EERKQVAATKLDAaNKAFAGVEEKHAQftMAAFGAAFAAYKGA 

190 200 210 



1698 1740 1770 1800 1830 

PKRKPDVKPERKPDVKPD VKPDVKPEAKPEDKPDVKPDVKPEAKPDVKPEAKPE 

35 INI I IhhIIIIIII : Ih III Ih III III 

VKAELKAAGASDFYTKKIDSADTVDGVKTLREMILDSIAKPEVEPEAKPEPKLEPKPEPKPEPKPEPKPEPKPE 

220 230 240 250 260 270 280 

1860 1890 1920 1950 1980 2010 2040 2070 

40 AKPEAKPEAKPEAKPEAKPDVKPEAKPDVKPEAKPEAKPEAKSEAKPEAKLEAKPEAKPATKKSVOTSGISnJiAK^ 

III III III III Ih Ih :| I mil III I 

PKPEPKPEPKPEPKPEPKPEPKPKPQPKPAPAPKPEftKKEEKKAAP r K 

300 310 320 330 

45 2100 2130 2160 2190 2220 2250 2280 2310 

KYSKKLPSTGEAASPLLAIVSLIVMLSAGLITIVLKHKKN*IYF*T*TERSILSKS*GKPHQNFAFFI*ILE*FSRYEN* 

: : llllllll :h: =1 II Ih = = hi 
QDTNKLPSTGEATNPFFTftAALAVMaGftGVAAVSTRRKEN 

350 360 370 

50 

SEQ ID 6266 (GBS3) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 3 (lane 5; MW 65kDa). The GBS3-His fusion product was purified (Figure 189, 
lane 8) and used to immunise mice. The resulting antiserum was used for FACS (Figure 261), which 
confirmed that the protein is immunoaccessible on GBS bacteria. 



55 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



wo 02/34771 



PCT/GBOl/04789 



-2284- 

Example 2025 

A DNA sequence (GBSx2136) was identified in S.agalactiae <SEQ ID 6267> which encodes the amino 
acid sequence <SEQ ID 6268>. This protein is predicted to be transcription antitermination protein nusg 
(nusG). Analysis of this protein sequence reveals the following: 

5 Possible site: 48 

»> Seems to have no N-termlnal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3203 (Affirmative) < suco 
10 bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAa53738 GB:X76134 nusG [Staphylococcus camosus] 
15 Identities = 90/175 (51%), Positives = 118/175 (67%), Gaps = 2/175 (1%) 

Query: 7 KGWFVLQTYSGYENKVKENLLQRAQTYNMLDNILRVEIPTQTVNVEKNGKTKEIEENRFP 66 

K W+ + TYSGYENKVK+NL +R ++ NM + I RV IP + K+GK K++ + FP 

Sbjct: 8 KRWYAVHTYSGYENKraasmEKRVESMNMTEQIFRWIP 67 

20 

Query: 67 GYVLVEMWTDEAWFVVROTENVTGFVGSHGNRSKPTPLLEEEIRSILISMGQTVpVFDT 126 

GYVLVE+VMTDE+W+WRNTP VTGFVGS G SKP PLL +E+R IL MG D 
Sbjct: 68 GYVLVELVMTDESMYVVRNTPGVTGFVGSAGAGSKPNPLLPDEVRFILKQMGMKEKTIDV 127 

25 Query: 127 NIKEGDWQIIDGAFIGQEGRWEIENNKVKL--MINMFGSETQAELELYQVAEL 179 

++ G+ V+I G F Q G V EIE +K iCL +++MFG ET E+E Q+ +L 
Sbjct: 128 EVEVGEQVRIKSGPFANQVGEVQEIEADKFKLTVLVDMFGRETPVEVEFDQIEKL 182 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6269> which encodes the amino acid 
30 sequence <SEQ ID 6270>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

»> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm — Certainty=0. 3874 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certaintyi= 0.0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is sho\yn below. 

40 Identities = 170/179 (94%) , Positives = 178/179 (98%) 

Query: 1 MLDSFDKGWFVLQTySGYENKVKENLLQRAQTYNMLDNILRVEIPTQTVNVEKNGKTKEI 60 

MLDSFDKGWFVLQTYSGYENKVKENliQRAQTYNMIijNILRVEIPTQTVlJV^ 
Sbjct: 6 MLDSFDKGWFVLQTYSGYENIOTCENLLQRAQTYNMLDNILRVEIPTQTVNVEKN^ 65 

45 

Query: 61 EENRFPGYVLVEMVMTDEAWFWRNTPNVTGFVGSHGNRSKPTPLLEEEIRSILISMGQT 120 

EENRFPGYVLVEMVMTDEAWFWRNTPNVTGFVGSHGNRSKPTPLLEEEIR+IL+SMGQT 
Sbjct: 66 EENRFPGYVLVEMV]m)EAWFVVRNTPNVTGFVGSHGNRSKPTPLLEEEIRAILLSMGQT 125 

50 Query: 121 VDVFDTNIKEGDWQIIDGAFIGQEGRWEIENNKVKLMINMFGSETQAELELYQVAEL 179 

+DVFDTNIKEGDWQIIDGAF+GQEGRWEIENNKVKLM+NMFGSET AE+ELYQ+AEL 
Sbjct: 126 IDVFDTNIKEGDVVQIlDGAFMGQEGRVVEIENHKVKIMIjmFGSETVAEVELYQIAEL 184 



55 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 2026 

A DNA sequence (GBSx2137) was identified in S.agalactiae <SEQ ID 6271> which encodes the amino 
acid sequence <SEQ ID 6272>. This protein is predicted to be a glycosyl transferase. Analysis of this 
protein sequence reveals the following: 

5 Possible site: 16 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1558 (Affirmative) < suco 

10 bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:JiAF28363 GB:AF224467 putative glycosyl transferase [Haemophilus 
15 ducreyi] 

Identities = 98/259 (37%) , Positives = 155/259 (59%) , Gaps = 10/259 (3%) 

Query: 5 VALAVDSNYLDKaiiVTIKSICVYNRNITFYLENQOTPVEWTONINRKLEPIiGSKLINVK 64 
+ LA + +Y + L TIKSI ++N++I FYL N+D P EW +N KL L S++I++K+ 
20 Sbjct: 10 IVLAANQSYSEYILTTIKSIYLHNKHIRFYLimDYPTEWFDIIJJNKLRKIJlJSE 69 

Query: 65 YHYDIAHLTTFLTVS TWFRLFLADYIPSSRVLYLDSDIIWTNLDYLFELDFKGyYL 121 

N I + T+ +S T+FR F++D+I +V+YLD+DI+VN +L L++ D Y+L 
Sbjct: 70 TNDTIKNFKTYSHISSDTTFPRYFISDFIEQDKVIYLnaDIVVNGSLTELYQTDISNYFL 129 

25 

, Query: 122 AAVKDPHKISIE EGGFNAGMLLANLELWREDGLTKTLLKTAEELHRWKTGDQSILNI 177 

AAVKD + ENAGMLL N + WRE +T+ L +E+ + DQSIIjN+ 

Sbjct: 130 AAVKDIISEKIYVNNHIFNAGMLLINNKKWREHNITQFCLSLSEKYINSLPDADQSILNL 189 

30 Query: 178 VCHNRWLSIJmTOF--QTYDWSRYNHRSYLYiaiIENRTPNIIHFLTSDKPWNENSVaR 235' 

+ ++WL LN+ +N+ T + +Y YL ++ P IIH+ T KPW R 
Sbjct: 190 IFKDKWLKLNRGYNYLIGTDYLFFKYGKTRYLE-DLGETIPLIIHYNTEAKPWLNIFNTR 248 

Query: 236 FRELWWYYFQLDFCQLTGK 254 
35 FR ++W+Y++L++ + K 

Sbjct: 249 PRNIYWFYYEUSIWQDIYAK 267 

No corresponding DNA sequence was identitfied in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
40 vaccines or diagnostics. 

Example 2027 

A DNA sequence (GBSx2138) was identified in S.agalactiae <SEQ ID 6273> which encodes the amino 
acid sequence <SEQ ID 6274>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
45 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0417 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2028 

A DNA sequence (GBSx2139) was identified in S.agalactiae <SEQ ID 6275> which encodes the amino 
5 acid sequence <SEQ ID 6276>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

>>> Seetns to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.60 Transmembrane 306 - 322 ( 306 - 322) 

10 '■ Pinal Results 

bacterial membrane — Certaintyi=0. 2041 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF28363 GB:AF224467 putative glycosyl transferase [Haemophilus 

ducreyi] 

Identities = 88/259 (33%) , Positives = 156/259 (59%) , Gaps = 11/259 (4%) 

20 Query: 7 WLAGDYSYIRQIETTLKSLCVYHESNLSIFIFNQDIPQEWFLAMKDRVGQTGNQIQDVKL 66 

+VLA + SY I TT+KS+ ++++++ ++ N+D P EWF + +++ + ++I D+K+ 
Sbjct: 10 IVLAANQSYSEYILTTIKSIYLHNKHIRFYLLNRDYPTEWFDILNNKLRKLNSEIIDIKV 69 

Query: 67 FHDHLSPKWENKKLNHINY-MTYARYFIPQYISADTVLYLDSDLVVTTNLDNLFQIS^^^ 125 
25 +D + K +HI+ T+ RYFI +1 D V+YLD+D+W +L L+Q + N 

Sbjct: 70 TNDTIK---NFKTYSHISSDTTFFRYFISDFIEQDKVIYLDADIVVNGSLTELYQTDISN 126 

Query: 126 AYLAAVP ALFGLGYGFNAGVMVIMNQRWRQENMTIKLIEKNQKEIENANEGDQTI 180 

+LAAV ++ + FNRG+++INN++WR+ N+T + ++K I + + DQ+I 

30 Sbjct: 127 YFIlAAVKDIISEKIYVNNHIF^IAGMLLINNKKWREHNITQFCLSLSEKyINSLPDADQSI 186 

Query: 181 LNRMFENQVIYLDDTYNFQIGFD-MGAAIDGHKFIFDIPITPLPKIIHYISGIKPWQTLS 239 

LN +F+++ + L+ YN+ IG D + +++ D+ T +P IIHY + KPW + 

iSbjct: 187 LNLIFKDKWLKLNRGYNYLIGTDYLFFKYGKTRYLEDLGET-IPLIIHYNTEAKPWLNIF 245 

35 

Query: 240 NMRLREVWWHYHLLEWSSI 258 

N R R ++W Y L W I 
Sbjct: 246 NTRFRNIYWFYYELNWQDI 264 

40 No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 6276 (GBS395) was expressed in E.coli as a His-fiision product. SDS-PAGE analysis of total cell 
extract is shown in Figure 75 (lane 5; MW 47.4kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 83 (lane 8; MW 72kDa) and in Figure 
177Gane5;MW72kDa). 

45 GBS395-GST was purified as shown in Figure 217, lane 7. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2029 

A DNA sequence (GBSx2140) was identified in S.agalactiae <SEQ ID 6277> which encodes the amino 
50 acid sequence <SEQ ID 6278>. Analysis of this protein sequence reveals the following; 

Possible site: 48 

»> Seetns to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 1633 (Affirmative) < suco 

bacterial membrane CertaintyteO . 0000 (Not Clear) < suco 

5 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted tihiat this protein and its epitopes, could be useful antigens for 
10 vaccines or diagnostics. 

Example 2030 

A DNA sequence (GBSx2141) was identified in S.agalactiae <SEQ ID 6279> which encodes the amino 
acid sequence <SEQ ID 6280>. Analysis of this protein sequence reveals the following: 

Possible site: 44 
15 »> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.16 Transmembrane 36 - 52 ( 36 - 52) 

Final Results 

bacterial membrane Certainty=0 . 10S5 (Affirmative) < suco 

20 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10243> which encodes amino acid sequence <SEQ ID 
10244> was also identified. 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC77330 GB:AE000508 orf , hypothetical protein [Escherichia coli K12] 
Identities = 75/260 (28%) , Positives = 123/260 (46%) , Gaps = 22/260 (8%) 

Query: 6 VGLVLEGGGMRGLYTAGVLDAFLDASIK-IDGIVSVSAGALFGVNFVSRQRERALRYNKK 64 
30 + LV EGGG RG++TAGVLD P+ A D + SAGA F+ Q A + + 

Sbjct: 25 lALVCEGGGQRGIPTAGVLDEFMRAQFNPFDLYLGTSAGAQNLSAFICMQPGYARKVIMR 84 

Query: 65 YLSHPKYMSLRSWFRTGNFVNKDF TYYEVPMKLD VFDDEAFKKSSIDFYWA 116 

Y + ++ + R GN ++ D+ T + + P+++D +FD S FY+ A 

35 Sbjct: 85 YTTKREFFDPLRPVRGGNLIDLDWLVEATASQMPLQMDTAARLFD SGKSFYMCA 138 

Query: 117 TEMTSGKPEYFKIDSVFEQMEILRASSALPVVSKM-VDWQGKKYLDGGLSDSIPVDFARG 175 

P YF + + ++++RASSA+P + V ,+G YLDGG+SD+IPV A 
Sbjct: 139 CRQDDYAPNYF-LPTKQNWLDVIRASSAIPGFYRSGVSLEGINYLDGGISDAIPVKEAAR 197 

40 

Query: 176 LGFDKLIWMTRPLNYQKKPSSGR LYKTLYRKYPNFVKTASNRYQQYNNSLEKVM 230 

G L+V+ TP P + L + +NV+ Y+ +EK 

Sbjct: 198 QGAKTLWIRTVPSQMYYTPQWFKRMERWLGDSSLQPLVNLVQHHETSYRDIQQFIEKPP 257 

45 (Juery: 231 SLEKTGDLFAIRPSKSLVIG 250 

+ +++ +P S+ +G 
Sbjct: 258 GKLRIFEIYPPKPLHSIALG 277 

No corresponding DNA sequence was identified in S.pyogenes. 

50 Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8959> and protein <SEQ ID 8960> were also identified. Analysis of this 
protein sequence reveals the following: 
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Lipop: Possible site: -1 Grand: 10 
McG: Discrim Score: -5.16 
GvH: Signal Score (-7.5): -2.17 

Possible site: 44 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -0.16 threshold: 0.0 

INTEGRAL Likelihood = -0.16 Transmembrane 36 - 52 ( 36 - 52) 
PERIPHERAL Likelihood =4.14 18 
modified ALOM score: 0.53 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 1065 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF01611(316 - 1050 of 1449) 

OMNI|nT01EC5264(37 - 289 of 369) hypothetical protein 
%Match =9.2 

%Identity =29.7 %Similarity =49.8 

Matches = 74 Mismatches = 118 Conservative Sub.s = 50 

273 303 333 363 393 420 450 480 

QKKQLYFAIL*SNINIRK*LPMLSVGLVLEGGGMRGLyrAGVLDAFLDAGIK-IDGIVSVSAGALFGVNFVSRQRERALR 

: II nil ll-llllll |: I 1 : llll 1= I I = 

VGQRIPVTLGNIAPLSLRPFQPGRIALVCEGGGQRGIFTAGVLDEFMRAQFNPFDLYLGTSAGAQNLSAFICNQPGyARK 

30 40 SO 60 70 80 90 

510 540 588 618 648 678 708 

■YNKKYLSHPKYMSLRSWFRTGNFVNKDF TYYEVPMKLDVFDDEAFKKSSIDFYWATEMTSGKPEYFKIDSVFEQM 

:| : :: : I I!-: 1= I -l:-! = I 11= I 111 = = = 

VIMRYTTKREFFDPLRFTOGGNLIDIiDWLVEATASQMPLQMDT--AARLFDSGKSFYMCACRQDDYAENYF-LPTK^^ 

110 120 130 140 150 160 

738 765 795 825 855 885 912 930 
EILRASSALPWSKM-VDWQGKKYLDGGLSDSIPVDFARGLGFDKLIVVMTRPIiNYQKKPSS-GRLYKTL YRKYPN 

::=lllll=l = I =1 11111=11=111 I I 1=1= II I 1= = I =1 

DVIRASSAIPGFYRSGVSLEGINYLDGGISDAIPVKEARRCJGaUCTLVVIRTVPSQMYYTPQWFKRMERWLGDSSLQPLVN 

180 190 200 210 220 ' 230 240 

960 990 1020 1050 1080 1110 1140 1170 

FVKTASNRyQQYNNSLEKVMSLEKTGDLFAIRPSKSLVIGRLEKNPDKLDSIYQLGMKDAKSVMPELNSYLMK*RKQYFS 

:|: |: =11 = : = = =1 h =1 

LVQHHETSYRDIQQFIEKPPGKLRIFEIYPPKPLHSIALGSRIPALREDYKLGRLCGRYFLATVGKLLTEKAPLTRHLVP 
260 270 280 290 300 310 320 

SEQ ID 8960 (GBS394) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 75 (lane 4; MW 34.7kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 83 (lane 7; MW 60kDa). 

GBS394-GST was purified as shown in Figure 217, lane 6. 
Example 2031 

A DNA sequence (GBSx2142) was identified in S.agalactiae <SEQ ID 6281> which encodes the amino 
acid sequence <SEQ ID 6282>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .3004 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside 



Certaiiity=0. 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useftil antigens for 
vaccines or diagnostics. 

Example 2032 

A DNA sequence (GBSx2143) was identified in S.agalactiae <SEQ ID 6283> which encodes the amino 
acid sequence <SEQ ID 6284>. This protein is predicted to be transporter protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 49 

»> Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial membrane -- 
bacterial outside -- 
bacterial cytoplasm -- 



Certainty=0 .3739 (Affirmative) < suco 
Certaintyi=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) • < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC22759 GB:U32790 transporter protein [Haemophilus influenzae 
Rd] 

Identities = 139/391 (35%) , Positives = 221/391 (55%) , Gaps = 4/391 (1%) 

Query: 6 INKNNWRALIAAIVASGTDDLNIMFLAFSMSTIITDLHLSAAQAGWIGTITNLGMLVGGL 65 

+N W+ALI + V G D +++ L F +S I DL+L+ AQ G + T T +G + GG+ 
Sbjct: 5 VNSYGWKALIGSAVGYGMDGFDLLILGFMLSAISADLNLTPAQGGSLVTWTLIGAVFGGI 64 



Query: 66 IFGLLADRYNKFKVFKWTILIFSIATGLVFFTTNLSYLYIMRFIAGIGVGGEYGIAIAIM 125 

+F6 L+D+Y + +V WTIL+F++ TGL L I R lAGIG+GGE+GI +A+ 

Sbjct: 65 LFGALSDKYGRVRVLTWTILLFAVFTGLCAIAQGYWDLLIYRTIAGIGLGGEFGIGMALA 124 

Query: 126 AGIVPT^OTlGRISSLNGIAGQVGSISSALIlAGWLAPALGWRGLFLFGLLPIVLVLWMQFA 185 

A P + +S + QVC- + +ALL L P +GWRG+FL G+ P + +++ 

Sbjct: 125 AEAWPARHRAKAASYVALGWQVGVLGAALLTPLLLPHIGWRGMFLVGIFPAFVAWFLRSH 184 

Query: 186 VDDKDILDQYHTDADDEPLDI SIKALFDTPVIATQSLALMVMTTVQIAGYFGMMNW 241 

+ + +IQT + S + L +SL ++V+T+VQ GY+G+M W 

Sbjct: 185 LHEPEIFTQKQTALSTQSSFTDKLRSFQLLIKDKATSKISLGIWLTSVQNFGYYGIMIW 244 

Query: 242 LPTIIQTNLNVSVKNSSLWMIATILGMCLGMLVFGQLLDKFGPRLVYGCFLLSSAICVYL 301 

LP + L S+ S LW T+ GM G+ +FGQL D+G + + FL + I + + 
Sbjct: 245 LPNFLSKQLGFSLTKSGLWTAVTVCG^tlAGIWIFGQLADRIGRKPSFLLFQLGAVISIVV 304 

Query: 302 FQFATTMPSMIIGGAVVGFFVNGMFAGYGAMITRLYPHHIRSTANNLILNVGRAIGGFSS 361 

+ T M++ GA +6 FVNGM GYGA++ YP R+TA N++ N+GRA+GGF 
Sbjct: 305 YSQLTDPDIMLLAGAFLGMFVNGMLGGYGALMAEAYPTEARATAQNVLFNIGRAVGGPGP 364 

Query: 362 VIIGMILDVSNVSMVMLFLASLYIVSFLSML 392 

V++G ++ + + LA +Y++ L+ + 
Sbjct: 365 WVGSWLAYSFQTAIALLAIIYVIDMLATI 395 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 2377> which encodes the amino acid 
sequence <SEQ ID 2378>. Analysis of this protein sequence reveals the following: 



Possible site: 39 

»> Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial membrane -- 
bacterial outside -- 
bacterial cytoplasm -- 



Certainty=0. 4567 (Affirmative) < suco 
CertaintY=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 306/402 (76%) , Positives = 354/402 (87%) 

Query: 1 MSPLNINKNNWRMiIAAIVASGTDDLNIMFLAFSMSTIITDLHLSAAQAGWIGTITNLGM 60 

MS L+++ N RAL+AAI ASGTDDLN+MFLAFSMS+I+TDL LS Q GWI TITNLGM 
Sbjct: 1 MSTLSLDTTNKIUUl.VaAIAaSGTDDIJJVMFLAFSMSSim'DIiGLSGTQGGWIATIT^ 60 

' Query: 61 LVGGLIFGLLADRYNKFKVFKWTILIFSIATGLVFFTTNLSYLYIMRFIAGIGVGGEYGI 120 
LVGGL+FGLLADR++KFKVFKWTIL+FS+ATGL++FT +L YLY+MRFIAGIGVGGEYG+ 
Sbjct: 61 LVGGLLFGLLADRHHKFKVFKWTILLFSVATGLIYFTQSLPYLYLMRFIAGIGVGGEYGV 120 

Query: 121 AIAIMAGIVPTNKMGRISSXJJGIAGQVGSISSALLAGWLAPALGWRGLFLFGLLPIVLVL 180 

AIAIMftGIVP KMGR+SSIiNGIAGQ+GSISSAIiLa6WLAP+LGWRGLFLFGLLPI+LV+ 
Sbjct: 121 AIAIMA6IVPPEKMGRMSSLNGIAGQLGSISSALLAGWLAPSLGWRGLFLFGLLPILLVI 180 

Query: 181 WMQFAVDDKDILDQYNTDADDEPLDISIKALFDTPVLATQSLALMVMTTVQIAGYFGMMN 240 

WM A+DD+ IDY+++ IILFTL Q+LALMVMTTVQIAGYFGMMN 
Sbjct: 181 VmTLAIDDQKIWDHYGQEEEECSQPIKINELFKTKSLTAQTLAI^WMTTVQIAGYFGMMN 240 

Query: 241 WLPTIIQTNLNVSVKNSSLWMIATILGMCLGMLVFGQLLDKFGPRLVYGCFLLSSAICVY 300 

WLPTIIQT+LN+SVK+SSLWM+ATI+GMCLGML FGQLLD FGPRL+Y FLL+S+ICVY 
Sbjct: 241 WLPTIIQTSLNLSVKSSSLWMVATIVGMCLGMLYFGQLLDCFGPRLIYSLFLLASSICVY 300 

■ ' Query: 301 LFQFATTMPSMIIGGAWGFFVNGMFAGYGAMITRLYPHHIRSTANNLILNVGRAIGGFS 360 
LFQFA +M SM+IGGA+VGFFVNGMFAGYGAMITRLYPHHIRSTANN+ILNVGRA+GGFS 
Sbjct: 301 LFQFANSMASMVIGGAIVGFFVNGMFAGYGAMITRLYPHHIRSTANNVILNVGRALGGFS 360 

Query: 361 SVIIGMILDVSNVSMVMLPLASLYIVSFLSMLSIKQLKRQKY 402 

SV IG ILD S +SMVM+FLASLY++SP . +M SI QLK ++Y 
Sbjct: 361 SVAIGSILDASGISMVMIFLASLYVISFGRMWSIGQLKAERY 402 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2033 

A DNA sequence (GBSx2144) was identified in S.agalactiae <SEQ ID 6285> which encodes the amino 
acid sequence <SEQ ID 6286>. This protein is predicted to be leucyl-tRNA synthetase (leuS). Analysis of 
this protein sequence reveals the following: 

Possible site: 52 

»> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 .3481(Af f irmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

5 

A related GBS nucleic acid sequence <SEQ ID 10241> which encodes amino acid sequence <SEQ ID 
10242> was also identified. 

The protein has homology with the following sequences in the GEMPEPT database. 

>GP:AAC00259 GB:AF008220 leucine tRNA. synthetase [Bacillus subtilis] 
10 Identities = 569/835 (68%) , Positives = 666/835 (79%) , Gaps = 42/835 (5%) 

Query: 10 YNHKEIEPKWQAFWADNHTFKTGTDASKPKFYALDMFPYPSGAGLHVGHPEGYTATDILS 69 

+ HKEIE KWQ +W +N TF T + K KFYALDMFPYPSGAGLHVGHPEGYTATDILS 
Sbjct: 3 FQHKEIEKKWQTYWLENKTFATLDNNEKQKFYALDMFPYPSGAGLHVGHPEGYTATDILS 62 

15 

Query: 70 RFKEAQGHNVLHPMGWDAFGLPAEQYAMDTGNDPAEFTAENIANFKRQINALGFSTOWDR 129 

R KR QG++VLHPMGWnAFGLPAEQYA+DTGNDPA FT +NI NF+RQI ALGFSYDWDR 
Sbjct: 63 RMKRMQGYDVLHPMGWimFGLPAEQYALDTGNDPAVFTKBNinNFRRQIQAIiGFSYDTO^ 122 

20 Query: 130 EVNTTDPNYYKM'QWIFTKLYEKGLAYEAEVPVNWVEELGTAIANEEVLPDGTSERGGYP 189 

E+NTTDP YYKWTQWIF KLYEKGLAY EVPVNW LGT +ANEEV+ DG SERGG+P 
Sbjct: 123 EINTTDPEYYKWTQWIFLKLYEKBLAYVDEVPVNWCPAIXSTVLANEEVl-DGKSERGGHP 181 

Query: 190 VTOKPMRQVWOiKITAYAERLLEDLEEVDWPESIKDMQHlWIGKSTGfiNVTFKVK^ 249 
25 V R+PM+QWMLKITAYA+RLLEDLEE+DWPESIKCMQRISIWIG+S GA+V F + D F 

Sbjct: 182 VERRPMKQWMLKITAYADRLLEDLEELDWPESIKDMQRNWIGRSEGAHVHFAIDGHDDSF 241 

Query: 250 TVFTTRPDTLFGATYAVLAPEHALVDAITTADQAEAVAEYKRQASLKSDLARTDLAKEKT 309 
TVFTTRPDTLFGATY VLAPEHALV+ ITTA+Q ERV Y ++ KSDL RTDLAK KT 
30 Sbjct: 242 TVFTTRPDTLFGATYTVLAPEHALVENITTAEQKEAVEAYIKEIQSKSDLERTDIiAKTKT 301 

Query: 310 GVWnaYAINPVNGKEIPVWIADYVLASYGTGAIMAVPAHDERDWEFAKQEl^ 369 

GV+TGAYAINPVNG+++P+WIADYVLASYGTGA+MAVP HDERD+EFAK F L + V++ 
Sbjct: 302 GVFTGAYAINPVNGEKLPIWIADYVLASYGTGAVMAVPGHDERDFEFAKTFGLPVKEVVK 361 

35 

Query: 370 GGNVEEAAFTEDGLHINSDFLDGLDKAAAIAKMVEWLEAEGVGNEKVTYRLRDWLFSRQR 429 

GGNVEEAA+T DG H+NSDFL+GL K AI K++ WLE G +KVTYRLRDWLFSRQR 
Sbjct: 362 GGNVEEAAYTGIX3EHVNSDFLNGnHKQEAIEKVIAWLEETKNGEKK\mrai^ 421 

40 Query: 430 YWGEPIPIIHWEDGTSTAVPESELPLVLPVTKDIRPSGTGESPLANLTDWLEVT-REDGV 488 

YWGEPIP+IHWEDGTSTAVPE ELPL+LP T +I+PSGTGESPLAN+ +W+EVT E G 
Sbjct: 422 YWGEPIPVIHWEDGTSTAVPEEEIiPLILPKTDEIKPSGTGESPLANIKEWVEVTDPETGK 481 

Query: 489 KGRRETNTMPQWAGSSWYYLRYIDPHNTEKLADEELLKQWLPVDIYVGGAEHAVLHLLYA 548 
45 KGRRETNTMPQWAGS WY+LRYIDPHN ++IiA E L++WLPVD+Y+GGAEHAVLHLLYA 

Sbjct: 482 KGRRETISnMPQWAGSCSreFIRYIDPHNPDQIJ^PEKLEKM^PVDMYlGGAEHA^ 541 

Query: 549 RFWHKVLYDIX3VVPTKEPFQKLFNQGMIIX3TSYRDSRGALVATDKVEKRDGSFFHVETGE 608 
RFWHK LYD+GWPTKEPFQKL+NQGMILG EE 
50 Sbjct: 542 RFWHKFLYDIGWPTKEPFQKLYNQGMILG- -- ENNE 575 

Query: 609 ELEQAPAKMSKSLK^rWNPDDVVEQYGRDTLRVYEMFMGPLDASIAWSEEGLEGSRKFLD 668 

KMSKS NVVNPD++V +GaDTIiR+YEMPMGPLDASIAWSE GL+G+R+FLD 
Sbjct: 576 KMSKSKGNVVNPDEIVASHGaDTLRLYEMFMGPLDASIAWSESGLDGARRFLD 628 

55 

Query: 669 RVYRLI TTKEITEKNSGALDKVYNETVKAVTEQVIX3MKFNTAIAQ]»;FV^^ 722 

RV+RL +1 E L++VY+ETV VT+ + ++ENT I+QLMVF+N A 

Sbjct: 629 RVWRLFIEDSGEIJJGKIVEGAGETLERVYHETVMKVTDHYEGLRFNTGISQLMVFINEAY 688 

60 Query: 723 KEDKLFSDYAKGFVQLIAPFAPHLGEELWQVLTASGQSISYVPWPSYDESKLVENEIEIV 782 

K +L +Y +GFV+L++P APHL EELW+ L SG +I+Y WP YDE+KLV++E+EIV 
Sbjct: 689 KATELPKEYMEGFVKLLSPVAPHLAEELWEmSHSG-TIAYEAWPVYDETKLVDDEVEIV 747 

Query: 783 VQIKGKVKAKLWAKDLSREELQDLALANEKVQAEIAGKDIIKVIAVPNKLVNIV 837 
65 VQ+ GKVKAKL V D ++E+L+ LA A+EKV+ ++ GK I K+IAVP KLVNIV 
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Sbjct: 748 VQLNGKVKAKLQVPADATKEQLKQIAQMEKVKEQLEGKTIRKIIAVPGKLVNIV 802 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6287> which encodes the amino acid 
sequence <SEQ ID 6288>. Analysis of this protein sequence reveals the following: 

5 Possible site: 46 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm CertaintY=0 .4358 (Affirmative) < suco 

10 bacterial membrane — Certainty=0.0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 815/833 (97%) , Positives = 827/833 (98%) 

15 

Query: 7 mFYNHKEIEPKWQAFWADNHTPKTGTnASKPKFYALDMFPYPSGAGLHVGHPEfe 66 

MTFY+H lEPKWQAFWftDNHTFKTGTIWSKPKFYMiDMFPYPSGaGLHVGHPEGYTATD 
Sbjct: 1 MTFYDHTAIEPKWQAFWRDNHTFKTGTnASKPKFYALDMFPYPSGaGLHVGHPEGYTATD 60 

20 Query: 67 ILSRFKRAQGHNVLHPMGWDAFGLPAEQYAMDTGNDPAEFTAENIANFKRQINALGFSYD 126 

HjSRFKRAQGHN+LHPMGWDAFGLPAEQYAMDTGNDPAEFTAENIANFKRQINALGFSYD 
Sbjct: 61 ILSRFKRAQGHNIIiHPMGWDAFGLPAEQYAimTGNDPAEFTAENIANFKRQINRIKSFSYD 120 

Query: 127 WDREVNTTDENYYKWTQWIFTKLYEKGLAYEAEVFVNWVEEI/STAIANEEVLPDGTSE 186 
25 TOREVNTTDPNYYKWTQWIFTKIYEKGIAYEAEVPVNWVEELGTAIMIEEV^ 

Sbjct: 121 WDREVNTTDPNYYKWTQWIFTKLYEKGLAYEAEVPVNWVEELGTAIANEEVLPDGTSERG 180 

Query: 187 GyPVTOKPMRQWMLKITAYAERLLEDLEEVDWPESIKDMQRNWIGKSTGANVTFKVKDTD 246 
GyPWRKPMRQWMLKlTAYAERLLEDLEEVDWPESIKDMQRNWIGKSTGANVTFKVKDTD 
30 Sbjct: 181 GYPVWKPMRQWMLKITAYAERLLEDLEEVDWPESIKDMQRNWIGKSTGfiNVTFKyKDTO 240 

Query: 247 KDFOTFTTRPDTLFGaTYAVLAPEHALVDAITTADQAEAVaEYKRQASLKSDIARTDLAK 306 

KDFTVFTTRPDTLFGATYAVLAPEHftLVraiTTADQaEAVA+YKRQRSLKSDLARTD^ 
Sbjct: 241 KDFTVFTTRPDTLFGATYAVI^EHftLVDAITTADQaEAVAKYKRQASLKSDIARTDr^ 300 

35 

Query: 307 EKTGVWTGAYAINPVNGKEIPWIADYVLASYGTGAIMAVPAHDERDWEFAKQENLDIIP 366 

EKTGVWTGAYAINPVNG E+PVWIADYVLASYGTGAIMAVPAHDERDWEFAKQF LDIIP 
Sbjct: 301 EKTGVVm3AYAINPVNGNEMPWIADYVIASYGTGAimVPAHDERDVraFAKQFKLDIIP 360 

40 Query: 367 VLEGGNVEEAAFTEIX3LHINSDFLDGLDKaAAIAKMVEWLEftEGVGNEKVTYRLRDWLFS 426 

VLEGGNVEEAAFTEDGLHINS FLIWLDKA+AIAKMVEWLEAEGVGNEKVTYRLRDWIjFS 
Sbjct: 361 VIKSGNVEEAAFTEDGIJIINSGFLDGLDKRSAIAKMVEWLEAEGVGNEKVITO^ 420 

Query: 427 RQRYWGEPIPIIHWEDGTSTAVPESELPLVLPVTKDIRPSGTGESPLANLTDWLEVTRED 486 
45 RQRYWGEPIPIIHWEDGTSTAVPESELPLVLPVTKDIRPSGTGESPLAN+TDWLEVTRED 

Sbjct: 421 RQRYWGEPIPIIHWEDGTSTAVPESELPLVLPVTKDIRPSGTGESPIiANVTDWLEVTRED 480 

Query: 487 GVKGRRETNTMPQWAGSSWYYLRYIDPHNTEKLftDEELLKQWLFTOIYVGGaEHAVLHLL 546 
GVKGRREmmPQWAGSSWYYLRYIDPHNTEKIADEELLKQWLPTOIYVGGaE^ 
50 Sbjct: 481 GVKGRRETimiPQWAGSSWYYLRYIDPHimiKLftDEEIiLKQWLFVDIYVGGaEHAV^ 540 

\ 

Query: 547 YARFWHKVLYDLGWPTKEPFQKLFNQGMILGTSYEDSRGALVATDKVEKRDGSFFHVET 606 

YARFWHKVLYDLGVVPTKEPFQKLFNQGMILGTSYRDSRGALVATDKVEKRDGSFFHVET 
Sbjct: 541 YARPWHKVLYDLGVVPTKEPFQKLENQGMILGTSYRDSRGaLVATDKVEKRDGSFFHVET 600 

55 

Query: 607 GEELEQAPAKMSKSLKNVVNPDOTVEQYGaDTLRVYEMFMGPLDASIAWSEEGLEGSRKF 666 

GEELEQAPAKMSKSLKNWNPDDWEQYGADTLRl^MFMGPLDASIAWSEEGLEGSRKF 
Sbjct: 601 GEELEQAPAKMSKSLKNWNPDDWEQYGADTLRVYEMFMGPLDASIAWSEEGLEGSRKF 660 

60 Query: 667 LDRVYRLITTKEITEENSGRLDKVYlffiTVKAOTEQVDQMKFOTAIAQLMVFVNRRNKEDK 726 

LDRVYRLITTKEITEENSGALDKVYNETVKAOTEQVDQMKFNTAIAQLMVFWIiafiNKEDK 
Sbjct: 661 LDRVYRLITTKEITEENSGALDKVYNETVKAOTEQVDQMKIOTAIAQLMV^^ 720 

Query: 727 LFSDYAKGFVQLIAPPAPHLGEELWQVLTASGQSISYVPWPSYDESKLVENEIEIWQIK 786 
65 LFSDYAKGFVQLIAPFAPHLGEELWQ LTASG+S1SYVPMPSYDESKLVEN++EIWQIK 
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Sbjct: 721 LFSDYAKGFVQLIAPFAPHLGEELWQALTASGESISYVPWPSYDESKLVENDVEIWQIK 780 

Query: 787 GIWKAKLVVAKDLSREELQDIAIANEKVQJffiIAGKDIIKVIAVEim.W 839 

GKWK3UaiVVAKDLSREEI^++ALaiffiKVQaEIAGKI)IIKVIAVPNK^ 
Sbjct: 781 GKVKRKLWAKDLSREEWJEVAIJaffiKVQftEIAGKDIIKVIAVPNKLVNIVIK 833 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2034 

A DNA sequence (GBSx2145) was identified in S.agalactiae <SEQ ID 6289> which encodes the amino 
acid sequence <SEQ ID 6290>. Tiiis protein is predicted to be KIAA1074 protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 35 

»> Seems to have an iincleavable N-term signal seg 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty!=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certaintyi=0.0000(Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 896 1> which encodes amino acid sequence <SEQ ID 8962> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
SRCFLG: 0 

McG: Length of UR: 19 

Peak Value of DR: 2.86 

Net Charge of CR: 4 
McG: Discrim Score: 10.27 
GvH: Signal Score (-7.5): -3.61 

Possible site: 31 
»> Seems to have an tincleavable H-term signal seq 
Amino Acid Composition: calculated from 1 
ALOM program count: 0 value: 2.12 threshold: 0.0 
PERIPHERAL Likelihood = 2.12 7 
modified ALOM score: -0.92 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certa;inty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty^O. 0000 (Not. Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 8962 (GBS 117) was expressed in E.coli as a BKs-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 29 (lane 8; MW 22.5kDa). 

GBSl 17-His was purified as shown in Figure 200, lane 7. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 
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Example 2035 

A DNA sequence (GBSx2146) was identified in S.agalactiae <SEQ ID 6291> which encodes the amino 

acid sequence <SEQ ID 6292>. This protein is predicted to be YirC (resE). Analysis of this protein 

sequence reveals the following: 

5 Possible site: 28 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL LikelihocxJ =-10.88 Transmembrane 177 - 193 ( 173 - 196) 
INTEGRAL Likelihood = -4.09 Transitiembrane 10 - 26 ( 5 - 29) 

10 Final Results 

bacterial membrane Certainty=0 . 5352 (Affirmative) <: suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty^O. 0000 (Not Clear) < suco 



15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15292 GB:Z99120 similar to two-component sensor histidine 
kinase [YvqA] [Bacillus subtilis] 
Identities = 108/379 (28%) , Positives = 193/379 (50%) , Gaps = 33/379 (8%) 

20 Query: 92 nNHKKESHDlIKYLTQKRLWQISKEKDGMFVTIKKKTYYVMTKDYSGI^ 151 

+N + S + L+ + ++ K D KKK Y + D +G V IKK 

Sbjct: 86 ENEEaSSDKDLSILSSSFIHKVYKLADKQ--EAKKKRY---SADVNGEKVFFVIKKiGLSV 140 

Query: 152 QSQLFHVINFS DITYTQHLITKINHFLIVILVLTYIPMLFIMRKTFTGIRESIQ 205 

25 Q +++++ D+ YT L ++ + V+++L++IP +++ + + + 

Sbjct: 141 NGQSAMMLSYALDSYRDDLAYT— LFKQLLFIIAWILLSWIPAIWLAKY LSRPLV 194 

Query: 206 SVQTYISSLWKNQGNHQSSQKEIVFSDFDPLLLESQEMANRIYQAEESQRNFFQMASHEL 265 
S+++++++ K + L +EM ++ Q +E++R QN SH+L 

30 Sbjct: 195 SFEKHVKRI--SEQDWDDPVKVDRKDEIGKLGHTIEEMRQKLVQKDETERTLLQNISHDL 252 



Query: 265 RTPLMSIQGYTEGVQEGII DAELAHSVILQESKKMKQLVDDIILLSKLD- -SNLSDQ 320 

+TP+M I+GYT+ +++GI D E VI E+ K+++ + D++ L+KLD + Q 
Sbjct: 253 KTPVMVIRGYTQSIKDGIFPKGDLENTVDVIECEALKLEKKIKDLLYLTKLDYLAKQKVQ 312 

35 

Query: 321 KDEFSLNELIiNSIIAYFKPLANKQKISITYRPDKHEKLLK-GNEELIQRAINNILSNaLR ,379 

D FS+ E+ +1 K A K+ +++ D E +L G+ E + + NIL N +R 
Sbjct: 313 HDMFSIVBVTEEVIERLK-WARKE---LSWEIDVEEDILMPGDPEQMISIKLLENILENQIR 368 

40 Query: 380 YAVSHIEISYT NQKLTISNDGPAISKEDLPYIFDRFYKGHGGQTGIGLAMTKEIIK 435 

YA + lEIS N +TI NDGP I EL +++ F KG G+ GIGL++ K 1+ 

Sbjct: 369 YAETKIEISMKQDDRNIVITIKNDGPHIEDEMLSSLYEPFNRGKKBEPGIGLSIVKRILT 428 

Query: 436 QHHGNIIAESDSTSTTFTI 454 
45 H +1 E+D T ++ I 

Sbjct: 429 LHKASISIENDKTGVSYRI 447 

There is also homology to SEQ ID 11 78. 

SEQ ID 6292 (GBS279) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
50 extract is shown in Figure 52 (lane 7; MW 54.5kDa). It was also expressed in E.coli as a GST-ftision 
product. SDS-PAGE analysis of total cell extract is shown in Figure 58 (lane 6; MW 79.4kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 2036 

A DNA sequence (GBSx2147) was identified in S.agalactiae <SEQ ID 6293> which encodes the amino 
acid sequence <SEQ ID 6294>. This protein is predicted to be two-component response regulator (mtrA). 
Analysis of this protein sequence reveals the following: 

5 Possible site: 37 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 1706 (Affirmative) < suco 

10 bacterial membrane — Certainty^O. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10239> which encodes amino acid sequence <SEQ ID 
10240> was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB0S663 GB:AP001513 two-component response regulator [Bacillus halodurans] 
Identities = 87/220 (39%) , Positives = 124/220 (55%) , Gaps = 4/220 (1%) 

Query: 11 lYFADDEKNIRDLVVPFLEHDGFTVRAFETGDLLLEAYKNQKPDLVILDIMMPGTNGLDV 70 
20 I DDE ++R+LV +L +GF V ETGD ++ + + DLV+LD+MM +G 

Sbjct: 7 ILIVDDELDLRELVTSYLRKEGFAVYTAETGDEAIKRLEQEPiroLVVLDVMMDEMDGFTA 66 

Query: 71 MKSIRQYDNIPIIMLTARDSDVDFITAFNLGTDDYFTKPFSPIKLSLHVKALFKRLDEKA 130 
K IR + IPIIMLTAR + D + +G DDY KPFSP +L ++ +R 
25 Sbjct: 67 CKEIRAFSQIPIIMLTARGGEDDKVMGLQIGADDYIVKPFSPRELVARIEVALRRTQGIQ 126 

Query: 131 IKNDTQYQFLDLTLDTEKRIALLSNEEMPLTRTEFDFLLVLIEKPETAFSRETLLNRIWG 190 

+DT Y+F +L + R ++ +E+ LTK E+D L+ L+E F+RE L +R+WG 

Sbjct: 127 QVDDTGYRFNELRIQPSGRKVFVNGQEISLTKKEYDLLVFLLEHRGRVFTREHLHDRLWG 186 

30 

Query: 191 FDDIES--RAVDDTIKRLRKKFKQYHSQVSIKTVWGYGFK 228 

D+ RVDIKLRKK + IKTVWG G+K 
Sbjct: 187 KETQQGTLRTVDTHIKTLRLKLKP--ADRFIKTVWGVGYK 224 

35 There is also homology to SEQ ID 3260. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2037 

A DNA sequence (GBSx2148) was identified in S.agalactiae <SEQ ID 6295> which encodes the amino 
40 acid sequence <SEQ ID 6296>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -2.18 Transmembrane 1568 -1584 (1568 -1585) 
INTEGRAL Likelihood = -0.16 Transmembrane 338 - 354 ( 338 - 354) 

45 



50 



Final Results 

bacterial membrane Certainty=0 . 1871 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty^O . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10237> which encodes amino acid sequence <SEQ ID 
, 10238> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:AAG09771 GB:AF243528 cell envelope proteinase [Streptococcus thermophilus] 
Identities = 797/1594 (50%) , Positives = 1056/1594 (66%) , Gaps = 39/1594 (2%) 

Query: 21 MNTKQRFSIRKYKLGRVSVLIX3TLFFLGGITimaDSVIl^SDIAVEQQVKDSPTS-IA 79 

M K+ FS+RICYK+G VSVLLG +F G +VaftD + + + VEVD+SA 
Sbjct: 1 MKK3CETFSLRKYKIGTVSVLt.GAVFLFAGAPSVaADELTSI.V-ETKVEATVPDAIVSESA 59 



Query: 80 NETPTNN--TSSAI1ASTAQDNLVTKANNSPTETQPVAESHSQATETFSPVANQPVESTQE 137 

+E+P +++ +T+ D T ++ + S + ET P P S ++ 

Sbjct: 60 SESPWEELVDTSVEATSTDVTTTDNEEETPGSEALENSANTEVETTQPAVETPAISEKK 119 

Query: 138 VSKTPLTKQISnjAVKSTPAISKET--PQiNIDSNKIITVPKOT(^^ 195 

V + K ++A ++T ++E PC3NIDSN IITVPKVW +GYKiGEGTWAIIDSGriD 
Sbjct: 120 VEEEE--KLSVADETTAIOTQEEAKPQNIDSNTIITVPKVWYSGYKGEGTWAIIDSGLD 177 

Query: 196 INHDALQLMJSTKAKYQNEQQmAAKAKAGIOTGKWYmK^TIFGHimTDVOT'ELKEV^ 255 

++HD L ++D + AKy++E+++ AAK AGI YG+W+N+KV+FG+NYVDVNT LKE 
Sbjct: 178 VDHDVLHISDLSTAKYKSEKEIEAAKEAAGITYGEWEiroKWFGYNYVDVN^ 237 

Query: 256 SHGMHVTSIATANPSKKDTNELIYGVAPEAQVMFMRWSDEKRGTGPALYVKAIEDAVKL 315 

SHGMHVTSIAT NP++ +L+YGWAPEAQVMFMRVFSD K TG ALYVKAIEDAVKL 
Sbjct: 238 SHGMHVTSIATGNPTQPVAGQLMYGVAPEAQVMFMRVFSDLKATTGAALYVKAIEDAVKL 297 

Query: 316 GADSINLSLGGANGSLVNADDRLIKALEMARLAGVSWIAAGNDGTFGSGASKPSALYPD 375 

GADSINLSLGGANGS+VN ++ + A+E AR AGVSWIAAGKDGTFGSG S PSA YPD 
Sbjct: 298 GADSINLSIXSGANGSVVNMNENVTAAIEAARRAGVSWIAAC^ 357 

Query: 376 YGLVGSPSTAREAISVASYNimTiVNKVFNIIGLENNK^^ 435 

YGLVG+PSTA +AISVaSYNin:T+ +KV NIIGLENN +IJ!I G +++ +P+ S FE+G 
Sbjct: 358 YGLVGAPSTAHDAISVRSYmrTVGSKVINIIGLEmADIJSrYGKSSFDNPEK^ 417 

Query: 436 KQYDYVFVGKGOTDNDYKDKTLNGKIArilERGDITFTKKVVNAINHGAVGAIIFNNKAGEA 495 

K+Y+YV+ G G +D+ L GK+ALI+RG ITF++K+ NA GAVG +IFN++ GEA 
Sbjct: 418 KEYEYWAGIGQASDFDGLDLTGKUiLIKRGTITFSEKIANATAAGAVGWIENSRPGEA 477 

Query: 496 NLTMSLDPEASAIPAIFTQKEFGDVIJUCNNYKIVFtmiKimQANPlC^^ 555 

N++M LD A AIP++F EFG+ LA N+YKl FNN + + NP AG+LSDFSSWGL+A 
Sbjct: 478 NVSMQLDDTAIAIPSVFIPLEFGEALAANSYKIAFNNETDIRPNPEAGLLSDFSSWGLSA 537 

Query: 556 DGQLKPDLSAPGGSIYAAINDNEYDMMSGTSMASPHVAGATALVKQYLLKEHPELKKGDI 615 

DG+LKPDL+APGG+IYAAINDN+Y M GTSMASPHVAGA LVKQYLL +P +1 
Sbjct: 538 DGELKPDLAAPGGAIYAAINDNDYAimQGTSMASPHVAGAAVLVKQYLLATYPTKSPQEI 597 

Query: 616 ERTVKYLLMSTAKRHLNKDTGAYTSPRQQGAGIIDVAAAVQTGLYLTGGE^ 675 

E VK+LLMSTAKAH+NK+T AYTSPRQQGAGIID AAA+ TGLYLT GE+ YGS+TL6N 
Sbjct: 598 EALVKHLLMSTAKAHVNKETTAYTSPRQQGaeilDTAAAISTGLYLT-GEDGYGSlTLGN 656 

Query: 676 IKDKISFDVTrasIINKOTUCDLHYTTYLOTDQVKDGFVTLAPQQLGTFTGKTIRIEPGQTQ 735 

++D SF VT+HNI K L+Y+T LTD+ L + + + + ++ + 

Sbjct: 657 VEDTFSFTVTLHNITNEDKTIJilYSTQLTTDTAQKRIDHLGSTSISRDS 716 

Query: 736 TITIDIDVSKYHDMLKKVMENGYFLEGYVRFTDPVDGGEVLSIPYVGFKGEFQNLEVLEK 795 

T+TI++D S + + L +M NGY+LEG+VRFTD D G+++SIPYVGF+GEFQNL VLE+ 
Sbjct: 717 TVTINVDASSFAEELTGLMKNGYYLEGFVRFTDVADDGDIVSIPYVGFRGEFQNLAVLEE 776 

Query: 796 SXYKLVANKEKGFYFQP--KQraEVPGSEDYTALMTTSSEPIYSTDGTSPIQLKAIiGSYK 853 

lY L+A+ + GFYF+P Q N V S YT L+T S+E lYSTD S +K LG++K 
Sbjct: 777 PIYmilADGKGGFYFEPVTAQENTVDISHHYTGLVTGSTELIYSTDKRSDSAIKTLGTPK 836 

Query: 854 SIWSKWILQLDQKGQPHLAISPNDDQNQDAVAWGVFLRNEKnSILRAKVYRADDVNLQKPL 913 

+ G ++L+LD+ G+PHLAISPN D NQD++ KGVFLRNh- +L A VY ADD PL 
Sbjct: 837 NKAGYFVLELDESGKPHIAISPNGDDNQDSLWKGVFLRlJYTDLVASVYAaDDTERTNPL 896 

Query: 914 WVSAPQAGDKNYYSGNTENPKSTFLYDTEWKGTTTDGIPLEDGKYKYVLTYYSDVPGSKP 973 

W S PQ+GDKN YSGN +NPKS+ +Y TEW GT +DG L DGKY+YVLTY S VPG+ 
Sbjct: 897 VffiSQPQSGDKHIYSGKPKNPKSSIIYPTEWNGTDSDGKRlMXSKYQYVLTYSSKVPGSA^ 956 



Query: 974 



QQMVFDITLDRQAPTLTTATYDKDRRIFKARPAVEHGESGIFREQVFYLKKDKD6HYNSV 1033 
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Q M+FD+ +DR++P +TTATYD+ F RPA+E GESG++REQVFYL D G ++ 
Sbjct: 957 QTMIPDyilDRESPVITTATYDETNFTFNPRPAIEKGESGLYREQVFYLVftDASG-VTTI 1015 

Query: 1034 LRQQGEDGILVEDNKVPIKQEKDGSFILPKEVJSTOFSHVyYTVEDYAGNLVSAKLEDLINI 1093 
5 + V DNKVF+ Q DGSF LP ++ D S YYTVEDYflGN+ K+E+LI+I 

Sbjct: 1016 PSIiLKNGDVTVSDNKVFVAQNDDGSFTLPLDLADISKFYYTVEDyAGNISYEKVENLISI 1075 

Query: 1094 GNKNGLVNVKVFSPELNSNVDIDFSYSVKDDKGNIIKK-QHHGKDIjNLLKLPFGTYTFDL 1152 
GN+ GLV V + + NS V I FSYSV D+ G 1+ + + D ++LKLPFGTYTFDL 
10 Sbjct: 1076 GNEKGLVTVNILDKDTNSPVPILFSYSVTDETGKIVaELPRYAGDTSVLKLPFGTYTFDL 1135 

Query: 1153 FLYDEERANLISPKSVTVTISEKDSLKimiFKVNLLKKAftLriVEFDKLLPRG^ 1212 

FLYD E ++L VTI E +S +V F V L KA LL++ D LLP G+T+QLVT 

Sbjct: 1136 FLYIWEWSSLAGETKAVVTILEDNSTAEVNFYOTLKDKANLLIDIDALLPSGSTIQLOT 1195 

15 

Query: 1213 TNTVVDLPKATYSPTDYGKMIPVGDYRLNVTLPSGYSTLENLDDLLVSVKEDQVNLTKLT 1272^ 

+ LP A YS TDYGK +PVG Y + TLP GY LE LD V+V +Q N+ KLT 
Sbjct: 1196 DGQAIQLENAKYSKTDYGKFVPVGTYTILPTLPEGYEFLEELD---VAVLfiNQSNVKKLT 1252 

20 Query: 1273 LIKraCAPLINALftEQIDIITQPVFYlSIAGTHLKNNYLAJS^ 1332 

LINK L +AE + +YWA L+ Y LE A + N+ Q +D+A+A+ 

Sbjct: 1253 LINKVRLKELIAELRGLEETARYYlSlASPELQTAYAKALEDAmVYANKHNQAQVDSALAS 1312 

Query: 1333 LRESRQALNGKETDTSLLAKAILAETEIKGNYQFVNASPLSQSTYINQVQLAKNLLQKPN 1392 
25 L +R+ LNG+ TD L + T + N+ + NA Q Y V+ A+ +L + N 

Sbjct: 1313 LVAAREQLNGQATDKEKLIAEVSNYTPTQANFIYYNftENTKQIAYDTAVRSAQLVLNQEN 1372 

Query: 1393 VTQSEVDKALENLDIAKNQLNGHETDYSGLHHMIIKANVLKQTSSiWQ^ 1452 
VTQ+ V++AL +L AK L+G +TD S L + ++VLK T +KY NAS+ K+ Y+ 
30 Sbjct: 1373 VTQAVVNQALftDLLAAKANLDGQKTDISALRSAVSVSSVLKATDAKYLISB^EN^^ 1432 

Query: 1453 LIKKAELLLSNRQATQAQVEEliLNQIKATEQELDG RDRVSSAENYSQSLNDNDSLN 1508 

++ A+ +L + A+QA V++ L + + + ELDG + N + D ++ 

Sbjct: 1433 AVEAAKAILVDESASQASVDQALAVLTSftQAELDGVATSTinMVKEPAOTATDKmEGT^ 1492 

35 

Query: 1509 TTPIN PP NQPQALIFK3CBMTKESEVaQiaiVLGOTSQTDNQKVKTNKL 1555 

PI+ PP N I+K + ++L + + NQ+ + +L 

Sbjct: 1493 PPPIDSEIVDVQAPPVKDTGNSEHVPIGQK-PNPQPTLPRPVTLQASLSSPNQEKQVTQL 1551 

40 Query: 1556 PKTGESTPKITYTILLFSLSMLGLATIKLKSIKR 1589 
P TGE+ K L ++GL T+ L SI+R 

Sbjct: 1552 PNTGENDTK YYLVPGVIIGLGTL-LVSIRR 1580 

A related GBS gene <SEQ ID 8963> and protein <SEQ ID 8964> were also identified. Analysis of this 
45 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 

SRCFLG: 0 

McG: Length of DR: 1 

Peak Value of OR: 2.55 
50 Net (aiarge of CR: 4 

, MoG: Discrim Score; 2.60 
GvH: Signal Score (-7.5): -0.78 

Possible site: 35 
>>> Seems to have a cleavable N-term signal seq. 
55 Amino Acid Composition: calculated from 36 

ALOM program count: 1 value: -0.16 threshold: 0.0 

INTEGRAL Likelihood = -0.16 Transmembrane 318 - 334 { 318 - 334) 
PERIPHERAL Likelihood = 2.54 1161 
modified ALOM score: 0.53 
60 icml HYPID: 7 CFP: 0.106 

*** Reasoning Step: 3 

Final Results 

65 bacterial membrane Certainty=0 . 1065 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty^O . 0000 (Not Clear) < suco 

LPXTG motif: 1535-1539 

The protein has homology with the following sequences in the databases: 

50.5/67.5% over 1583aa 

Streptococcu 

s thermophilus 

GP] 9963932 1 cell envelope proteinase Insert characterized 

ORF01603 (361 - 5070 of 5370) 

GP| 9963932 |gb|AAG09771.l|AF243528_l|AF243528(l - 1584 of 1585) cell envelope pro 
teinase {Streptococcus thermophilus} 
%MatGh =41.2 

%Identity = 50.4 %Similarity = 67.4 

Matches = 794 Mismatches = 498 Conservative Sub.s = 267 

255 285 315 345 375 405 435 465 

KNaiX3TVLNLPQimL**KFRKi:i*KILIFyVLIVWIIMLQEKEIFMNTKQRFSIR^ 

I |: IhlllM mill :|:: I :|ll 
MKKKETFSLRKYKIGTVSVLLGAVFLFAGAPSVAA 
10 20 30 

495 525 552 576 606 636 666 696 

DSVINKPSDIAVEQQVKDSPTS - lANETPT- -NIWSSAIASTAQDISmOTKfiNNSPTETQPVAESHSQATETFSPVANQPV 
I : : II I h I M:| :== :|: I I :: : j : || | | 

DE-LTSIiVETKOTaTVPDAIVSESASESPVVEELVDTSVEATSTDVTTTD]ffiEETPGSEALENSANTEVETTQPAVETPA 
50 60 70 80 90 100 110 

726 756 780 810 840' 870 900 930 

ESTQEVSKTPLTKQNLAVKSTPAISKE - - TPQNIDSNKI ITVPKVSWrGYKGEGTWAI IDSGLDINHDALQLNDSTKAK 

I : = l : I ::| ::| : = ! Illllll llllllll = II II II II II I I II II I II h^H ■■ II 

ISEKKVEEEE--KLSVaDETTAimJC3EEAKPQNIDSlSITIITVPKVWYSGyK!GEGTVVM 

130 140 150 160 170 180 190 

960 990 1020 1050 1080 1110 1140 1170 

YQNEQQMNAAKAKAGINYGKWYNNKVI FGHNYVDVNTELKEVKSTSHGMHVTS lATANPSKKDTNELI YGVAPEAQVMFM 

|::h:: III III I h I : h II : I M II I II I III lllllllllll l|:: : I : I I I I I I I I I I I I 

YKSEKEIEAAKEAAGITYGEWFNDKVVFGYimDVim/LKEEDKRSHGMHVTSIATGNPTQPVAGQ]^ 

210 220 230 240 250 260 270 

1200 1230 1260 1290 1320 1350 1380 1410 

RVFSDEKRGTGPALYVKAIEDAVKLGADSINLSLGGANGSLVNADDRLIKALEMaRLAGVSWIAAGNDGTFGSGRSKPS 

Hill L II lllllllllllllllllllllllllllhll : M II I I I I I I I I I I I I I I I I I I I II 
RVFSDLKATTGAALYVKAIEDAVKLGADS INLSLGGflNGSVVMMNENVTAAIEAARRAGVSVVIAAGNDGTFGSGHSNPS 
290 300 310 320 330 340 350 

1440 1470 1500 153,0 1560 1590 1620 1650 

ALYPDYGLVGSPSTAREAISVaSYSMTTLWKOTNIIGLENN^^ 

I lllllllhini :|IIIIIIIII|: :|| llllllll =11 I = = = =1= I l|:||:|:||= I I =1 
ADYPDYGLVGAPSTAHDAISVASYim'OTGSKVINIIGLENNADIJSnfGKSSFDNPEKSPVPFEIGKEYEYVYAGIGQM 

370 380 390 400 410 420 430 

1680 1710 1740 1770 1800 1830 1860 1890 

YKDKTIiNGKIALIERGDITFTKKV\mAINHGAVGAIIFNNKAGEaNLTMSLDPEASAIPAIFTQKEFGDV^ 

: I IMIhll ll|::|: II llll :|||:: lllhH II I ll|::| llh II hill I 
FDGLDLTGKLALIKRGTITPSEKIAmTAAGAVGWIFNSRPGEftNVSMQLDDTAIAIPSVFIPLEFGEaLftANSYKIJ^ 
450 460 470 480 490 500 510 

1920 1950 1980 2010 2040 2070 2100 2130 

miKNKQANPNAGVLSDFSSWGLTADGQLKPDLSAPGGSIYAAINDNEYDMMSGTSMASPHVAGATALVKQYLLKEHPEL 

II : •■ II l|:|lllllli|:|l|:|||l|:|II|:||lllllhl I llllllllllll Illllll : I 
mraTDIRPNPEAGLLSDFSSWGLSADGELKPDIAAPGGRIYAAIiroimYAimQGTSMASPHVAGAA^^ 

530 540 550 560 570 580 590 

2160 2190 2220 2250 2280 2310 2340 2370 

KKGDIERTVKYLLMSTAKAHMKDTGAYTSPRQQGAGIIDVAAAVQTGLYLTGGENNYGSVTLGNIKDKISFDV^^ 
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:|| Ihllllllllhlhl llllllllllllll III: IIIIMI |: ll|:|IM::| II IMII 
SPQEIEALVKHLLMSTAKAHVNKETTAYTSPRQQGAGIIDTAAAISTGLYLTG-EDGYGSITLGNVEDTFSFTVTLHNIT 
610 620 630 640 650 660 670 

2400 2430 . 2460 • 2490 2520 2550 2580 2610 

rorAKDLHYTTYLNTDQVKIX3FVTLAPQQLGTFTGKTIRIEPGQTQTIT 

I |:|:| III: I ::::::: |:||::| | : : | :| I I I = I I I : I I | I I 

NEDKTIjl^STQLTTDTAQKRIDHLGSTSISiyDSTOKOTVKftNSSTTVTIim)ASSFAEELT^ 

690 700 710 720 730 740 750 

2640 2670 2700 2730 2754 2784 2814 2844 

DGGEVLSIPYVGFKGEPQralEVLEKSIYKLVaNKEKGFYFQP--KiQmffiVPGSEDYTAL^^^ 

I |:::||||ll|:||llll llh II hh : lllhl III I II hi M lllll I =1 
DIXSDIVSIPYVGFRGEFQNIAVLEEPIYNLIiUDGKGGFYFEPVTAQPNTVDISHH^ 

770 780 790 800 810 820 830 

2874 2904 2934 2964 2994 3024 3054 3084 

LGSYKSIDGKWILQLDQKGQPHIAISPNDDQNQDAVAWGVFLRNFNm.RAKVYRADDVNLQKPLVA/SAPQAGDK^ 

l|::|: I ::|:||: hllllllll I ll|:: llllll|: :| I II III III I Ihllll III 

IXSTFKmaGYFVLELDESGKPHIMSPNGDDNQDSLVFKGVFLKNYTDLVASWAADDTERTNPLVffiSQPQSGD^^ 

850 860 870 880 890 900 910 

3114 3144 3174 3204 3234 3264 3294 3324 

NTENPKSTFLYDTEWKGTTTDGIPLEDGKYKYVLTYYSDVPGSKPQQMVFDITLDRQAPTLTTATYDKDRRIFKARPAVE 



NPKNPKSSIIYPTEWNGTDSDGNAIJUDGKyQYVLTYSSKVPGftAVQTMIFDVIIDRESPVITTATYDETNFTFNPRPAIE 
930 940 950 960 970 980 990 

3354 3384 3414 3444 3474 3504 3534 3564 

HGESGIFREQVFYLKKDKDGHYNSVLRQQGEDGILVEDNKVFIKQEKDGSFILPKEVNDFSHVYYTVEDYAGNLVSAKLE 

lll|::milll II:: : | llUj: | || :: M HIIIIIUI: hi 

KiGESGLYREQVFYLVADaSG-VTTIPSLLKNGDVWSDNK:VFVAQ^roDGSFTLPLDIADISKFYyTVEDYAGNISYEK^ 
1010 1020 1030 1040 1050 1060 1070 

3594 3624 3654 3684 3711 3741 3771 3801 

DLINIGimiGLVimCVFSPEUISimjlDFSYSVKDDKGNIIKK-QHHGKDLKn^ 

:lhllh 111 1 =: : 11 I 1 lllll h 1 h = =1 = = 1111111111111111 1 = = 1 
NLISIGlffiKBLVTVNILDKIjraSPVPILFSYSVTDETGKIVaELPRYAGDTSVLiaiPro 

1080 1090 1100 1110 1120 1130 1140 1150 

3831 3861 3891 3921 3951 3981 4011 4041 

VTVTISEKDSLKDVLFKA/NLLKKAALLVEFDKIiPKGATVQLOTKTNTVTO 

III I =1 =1111 II lh= I III hhllll = II I II lllll =111 I = III I 

AWriIiEDNST2ffiVNFYWLKDBffl]Sn^LIDinaU^PSGSTIQLVTlffiGQAIQnPBIRKYSK^ 

1160 1170 1180 1190 1200 1210 1220 1230 

4071 4101 4131 4161 4191 4221 4251 4281 

YSTLENLDDLLVSVKEDQVNLTKliTLINKAPLINAIAEQTDIITQPVFYmGTHLKNNYIJaJLEKAQTLIKNRVEQ 

I I I I I hi : I h I I I I I I I I : I I = : I I I h I I I h = 1= I =1 
YEFLEELD- - -VAVLaNQSN\naa.TLINKVaLKELIAEIiftGLEETARyY15aSPELQTAYAKAIiEI^^ 

1240 1250 1260 1270 1280 1290 1300 

4311 4341 4371 4401 4431 4461 4491 4521 

NAIAALRESRQALNGKETDTSLIiAKAILAETEIKGNYQFVNASPLSQSTYINQVQI^Km.LQKPNVTQSEVDKALENLDI 

:h|:l :h llh jl 1 : I = h : II 1 | j: |: =1 : |||h h:ll =1 

SAIiASLVAAREQI^GQATDKEKLIAEVS)!reTPTC3ANFIYYNAENTKQIAYDTAVRSaQLVMQEN^ 

1320 1330 1340 1350 1360 1370 1380 

4551 4581 4611 4641 4671 4701 4731 4761 

AKNQENGHETDYSGLHHMIIKANVLKQTSSKYQNASQFAKENYNNLIKKAELLLSmQATQAQVEELM 

II h|::|| I I = = = 111 I :|l llh h h :: h =1 = hll 1== I = = = UN 
AKft]sn^GQKTDISALRSAVSVSSVLKATDAKYIJ)JASENVKQAYDQAVEAAKAILVDESASQASVDQAIAVLTSAQAE^ 

1400 1410 1420 1430 1440 1450 1460 

4779 4809 4839 4845 4860 4890 4920 4950 

RDRVSSAENYSQSUStDNDSLNTTPIN PP NQPQALIFKKGMTKESEVAQKRVLGVTSQTDNQKV 

I = I == Ih II I = I :| : : : : I : : Ih 

VATSTNDAKEPAmATDKTOEGTVTPPPIDSErVDVQAPPVTCD^ 
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1480 • 1490 1500 1510 1520 1530 1540 

4980 5010 5040 5070 5100 5130 B160 5190 

KTNKIiPICrGESTPKITyTILLFSLSMLGLATIKLKSIKRE*OTLKNRARHQLIiAINS**LVPF* 
5 : :|1 111: | | : || : || 

QVTQLENTGENDTK- -YYLVPGVIIGLGTLIiVSIRRHKEEV 
1560 1570 1580 

A related GBS nucleic acid sequence <SEQ ID 10965> which encodes amino acid sequence <SEQ ID 
10 10966> was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6297> which encodes the amino acid 
sequence <SEQ ID 6298>. Analysis of this protein sequence reveals the following: 

LPXTG motif: 1614-1619 

15 Possible site: 33 

>>> Seems to have a cleavable N-term signal seg. 

INTEGRAL Likelihood = -4.46 Transtneiribrane 1623 -1639 (1621 -1641) 

20 Final Results 

bacterial membrane Certaintyi=0. 2784 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the databases: 

>GP:AAG09771 GB:AF243528 cell envelope proteinase [Streptococcus thermqphiliis] 
Identities = 465/1125 (41%) , Positives = 668/1125 (59%) , Gaps = 61/1125 (5%) 

Query: 1 VEKKQRFSLRKYKSGTFSVLIGSVFLW-TTTVAADELSTMSEPTITNHftQQQRQHLTNT 59 
30 ++KK+ FSLRKYK GT SVL+G+VFL +VAADEL+++ E + T 

Sbjct: 1 MKKKETFSLRKyKIGTVSVLLGAVFLFAGAPSVaADELTSLVETKVEA T 49 

Query: 60 ELSSAESKSQDTSQITLKTNREKEQSQDLVSEPTTTELADTDaaSMaNTGSDATQK^ 119 
+ S+S S + E+ D E T+T++ TD GS+A + SA 

35 Sbjct: 50 VPDAIVSESASESPW EELVDTSVEATSTDVTTTDNEE-ETPGSEALENSA-- 99 

Query: 120 PPVNTDVHDWVKTKGAWDKGYKGQGKWAVIDTGIDPAHQSMRISDVSTAKVKSKEDMLA 179 

NT+V T+ A + + KV + + ++D +TA +E 
Sbjct: 100 ---NTEVET---TQPAVETPAISEKKV EEEEKLSVADETTAIXNQEE 140 

40 

Query: 180 RQKAAGINYGSWINDKVWAHNYVENSDNIKE-NQFEDFDEDWENFEFDAEBEPKAIKKH 238 

K 1+ + I V+ Y + + D D D +. + A+ K+ K+ 

Sbjct: 141 -AKPQNIDSNTIITVPKVWYSGYKGEGTVVailDSGLDVDHDVLHISDLSTAKYKSEKEI 199 

45 Query: 239 KIYRPQSTQAPKETVIKTEETDGSHDIDWTQTDDDTKYESHGMHVTGIVAGNSKEAAATG 298 

+ + + E + G + +D + SHGMHVT I GN + A G 

Sbjct: 200 EAAKEAAGITYGEW-ENDKVVFGSNYVDVNTVLKEEDKRSHGMHVTSIATGNPTQPVA-G 257 

Query: 299 ERFLGIAPEAQVMFmJVFANDIMGSAESLPIKAlEIfflVALGftDVINLSLGTANGAQLSGS 358 
50 . + G+APEAQVMFMRVF++ + +L++KAIEnAV LGAD INLSLG ANG+ ++ + 

Sbjct: 258 QLMYGVAPEAQVMFMRVPSDLKATTGAALYVKAIEDAVKLGADSINLSLGGANGSVW^ 317 

Query: 359 KPLMEAIEKAKKAGVSWVAAGNERVYGSDHDDPLATNPDYGLVGSPSTGRTPTSVAAIN 418 
+ + AIE A++AGVSW+AAGN+ +GS H +P A PDYGLVG+PST SVA+ N 

55 sbjct: 318 ENVTAAIEAARRAGVSWIAAGNDGTFGSGHSNPSADYPDYGLVGAPSTAHDAISVASYN 377 

Query: 419 SKWIQRL^m?KELENRADL^lHGKAIYSESVDFKDIKDSLGYDKSHQFAYVKESTDAGYN 478 

+ V +++ + LEN ADtN+GK+ + ++ + + +G + + +A + +++D ++ 
Sbjct: 378 NTTVGSKVaNllGLENNADLNYGKSSF-DNPEKSPVPFEI6KEYEYVYAGIGQASD--FD 434 



60 



Query: 479 AQDWGKIALIERDPNKTYDEMIALAKKHGAI/3VLIENNKPGQSNRSMRLTANGMGIPSA 538 

D+ GK+ALI+R T+ E lA A GA+GV+IFN++PG++N SM+L + IPS 
Sbjct: 435 GLDLTGKIALIKRG-TITFSEOANATAAGAVGWIFNSRPGEANVSMQLDDTAIAIPSV 493 
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Query: 539 FISHEFGKAMSQLNGNGTGSLEFDSWSKAPSQKGNEMSfflFSNWGLTSDGYLKPDITAPG 598 

FI EFG+A++ + + F++ P+ + ++ FS+WGL++DG LKPD+ APG 

Sbjct: 494 FIPLEFGEALAA--.--NSYKIAENNETDIRPNPEAGLLSDFSSWGLSflDGELKPDIiaAPG 549 

CJuery: 599 GDIYSTVOTJNHYGSQTGTSMaSPQIAGaSLLWQYLEKTQPKTDPKEKIADIVKm 658 

G IY+ NDN Y + GTSMASP +AGA++LVKQYL T P ++I +VK+LLMS A 
Sbjct: 550 GAIYAAINDNDYANMQGTSMASPHVAGAAVLVKQYLLATYPTKSPQEIEALVKHLLMSTA 609 

Query: 659 QIHWPETKITTSPRQQGAGLLNIDGAVTSGLYVTGKDNYGSISLGNITDTMTFDVTVHN 718 

+ HVN ET TSPRQQGAG+++ A+++GLY+TG+D YGSI+I1GN+ DT +F VT+HN 
Sbjct: 610 KAHVKKETTAYTSPRQQGftGIIDTAAAISTGIjYLTGEDGYGSITLGamiDTFSFTVTIJIN 669 

Query: 719 LSNKDKTLRYDTELLTDHVDPQKGRFTLTSHSLKTYQGGEVTVPANGKVTVRVTMDVSQF 778 

++N+DKT1, Y T+L TD + TS S +++ +VTV AN TV + +D S F 

Sbjct: 670 iaTraDKTLNYSTQLTTDTAQKRIDHI^TSISRDSTO--KVTVKANSSTTTO 727 

Query: 779 TKELTK^^PlSKSYYLEGFWFRDSQDDQtmVNIPFVGFKGQFENIAVAEESIYRLKSQGK 838 

+ELT M NGYYLEGFVRF D DD + V+IP+VGF+G+P+NLAV EE lY L + GK 
Sbjct: 728 AEELTGL^^aIGYYIIEGFVRFrovaDr)G-DIVSIPYVGFRGEFQ^^^VLEEPIYNLIADGK 786 

Query: 839 TGFYFDE-SGPKDDIYVGKHFTGLOTLGSETOTSTKTISDNGLHTLGTFKNRDGKFILEK 897 

GFYF+ + + + + H+TGLVT +E ST SD+ + TLGTFKN G F+LE 
Sbjct: 787 GGFYFEPVTAQPNTVDISHHYTGLVTGSTELIYSTDKRSDSAIKTLGTFKNKAGYFVLEL 846 

Query: 898 NAQGNPVLAISPNGDNNQDFAAFKGVFLRKYQGLKASVYHASDKEHKNPLWVS-PESFKG 956 

+ G P LAISPNGD+NQD FKGVFIiR Y L ASVY A D E NPLW S P+S G 
Sbjct: 847 DESGKPHIAISENGDDNQDSLVFKiGVFLRNYTDLVASVYAaDDTER™PLWESQPQS--G 904 

Query: 957 DKN-FNSDIRFAKSTTLLGTAFSGKSLTGAELPDGHYHYWSYYPDWGAICRQEMTFDMI 1015 

DKN ++ + + KS+ + T ++G G L DG Y YV++Y V GA Q M FD+I 
Sbjct: 905 DKNIySC3^WKNPKSSIIYPTEM^GTDSDGNALADGKYQYVLTYSSKVPGaAVQTMIFDVI 964 

Query: 1016 IDRQKPVLSQATFDPETNRFKPEPLKDRGLftGVRKDSVFYLERKDNKPYTVTIND^ 1075 

+DR+ PV++ AT+D F P P ++G +G+ ++ VFYL + T+ V 

Sbjct: 965 IDRESPVITTATYDETNFTFNPRPAIEKGESGLYREQVPYLVADASGVTTIPSLIjKNGDV 1024 

Query: 1076 SVEDNKTFVERQADGSFILPLDKRKLGDFYYMVEDFAGNVAIAKL 1120 

+V DNK FV + DGSF LPLD A + FYY VED+AGN++ K+ 
Sbjct: 1025 TVSDNKVFVAQNDDGSFTLPLDLADISKFYYTVEDYAGNISYEKV 1069 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 543/1676 (32%) , Positives = 821/1676 (48%) , Gaps = 158/1676 (9%) 

Query: 24 KQRFSIRKYKLGAVSVLLGTLFFLGGI'mvaAD--SVI]!IKPSDIAVEQQVKDSPTSI 78 

KQRFS+RKYK G SVL+G++F + T VARD S +++P+ QQ T+ 

Sbjct: 4 KQRPSLRKyKSGTFS^7LlGSVFIlVM-TTTVAADELSTMSEPTITNHAQQQRQHLT^ 62 

i Query: 79 ANETPTNNTSSALASTAQD NLVTKANNSPTETQPVAESHSQATETPSPVANQPVE 133 

+ E+ + +TS T ++ +LV++ + A + ++ A+ P 

Sbjct: 53 SAESKSQDTSQITLKTimEKEQSQDLVSEPTTl^LRDTDAASMANTGSDATQKSASLPPV 122 

Query: 134 STQEVSKTPLTKQ--NLAVKSTPAISKETPQNID-SNKIITVPKVWNTGYKGEGTWAI- 189 

+T +V TK + K + ID +++ + + V K + ++A 

Sbjct: 123 irr-DVHDWVKTKGAWDKGYKGQGKVVAVIDTGIDPAHQSMRlSIWSTAKVKSK^ 181 

Query: 190 IDSGLDIN HDftLQLNDSTKAK -YQNEQQMNftAKAKaGINYGKW 231 

1+ G IN H+ ++ +D+ K ++N + A+ KA I K 

Sbjct: 182 KAAGINYGSWIITOKVVFAHNYVENSDNIKENQFEDFDEDWENPEFDAEAEPKA-IKKHKI 240 

Query: 232 YN HK7IFGHNYVDVNTELKEVKSTSHGMHVTSIATANPSKKD-TNEI. 277 

Y + G + +D + K SHGMHVT I N + T E 

Sbjct: 241 YRPQSTQAPKETVIKTEETDGSHDIDWTQTDDDTKYESHGMHVTGIVAGNSKEAAATGER 300 

Query: 278 IYGVAPEAQ\mFMRWSDEKRGTGPALyVKaiEDAVKK3ADSINLSIiGGANGSL\nJADDR 337 

G+APEAQVMFMRVF+++ G+ +L++KAIEDAV LGAD INLSLG ANG+ ++ 
Sbjct: 301 FLGIAPEAQVMFMRVFANDIMGSAESLFIKAIEDAVALGADVINLSLGTANGAQLSGSKP 360 
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Query: 338 LIKALEMRRLAGVSWIAAGNDGTFGSGASKPSALYPDYGLVGSPSTAREAISVASYNNT 397 

L++A+E A+ AGVSW+AAGN+ +GS P A PDYGLVGSPST R SVA+ N+ 
Sbjcf: 361 LMEAIEKaKKaGVSVWaaGNERVYGSDHDDPLATNPDYGLVGSPSTGRTPTSVAAINSK 420 

5 

Query: 398 TLVNKVFNIIGLEimRlin^GIiAAYA---DPKVSDKTFEVGKQYDYVFVGKGro 454 

++ ++ + LEW +LN+G AY+ DK + K + + +V+DY + 
Sbjct: 421 WVIQRLMTVKELENRADLNHGKAIYSESVDFKDIKDSLGYDKSHQFAYVKESTDAGYNAQ 480 

10 Query: 455 TIjNGKIALIERG-DITFTKKSAnilAIlffiGAVGAIIETOKAGEANLTMSLDPEM 513 

+ GKIALIER + T+ + + A HGA+G +IENNK G++N +M L IP+ F 

Sbjct: 481 DVKGKIALIERDPNKTYDEMIAIJUCKHGAIX3VLIENNKPGQSiroSNnU,TM^ 540 

Query: 514 QKEFGDVLAKNNYK IVFNNIKNKQANPNAGVLSDFSSWGLTADGQLKPDLSAPGGS 569 

15 EFG +++ N + F+++ +K + ++ FS+WGLT+DG LKPD++APGG 

Sbjct: 541 SHEFGKAMSQENGlSGT6SLEFDSWSKAPSQEG^ffi^ffl^HFSl™GLTSIX3YIlK^^ 600 

Query: 570 lYAAINDNEYDMMSGTSMASPHVAGATALVKQYLLKEHPELKKGDIERTVKYLLMSTAKA 629 
iy+ NDN Y +GTSMASP +AGA+ LVRQYL K P L K I VK LLMS A+ 
20 Sbjct: 601 lYSTYlTONinfGSQTGTSMaSPQIAGaSLLVKQYLEKTQHSmPKEKIADIVK^ 660 

Query: 630 HUJKDTGAYTSPRQQGAGIIDVAAAVQTGLYLTGGENOTGSVTMNIKDKISroVTVHNI 689 

H+N +T TSPRQQGAG+4-++ AV +GLY+TG ++NYGS++LGNI D ++FDVTVHN+ 
Sbjct: 661 HVNPETKTTTSPRQQGAGLLNIDGAVTSGLYVTG-KDNYGSISLGNITDTMTFDVTVHNL 719 

25 

Query: 690 NKVAKDLHYTTYI15TDQV--KDGFVTLAPQQIiGTFTGKTIRIEPGQTQTITIDIDVSKYH 747 

+ KLYTLTDV+GTL LT+G++ T++ +DVS++ 

Sbjct: 720 SNKDKTIiRYDTELLTDHVDPQRGRFTLTSHSLKTYQGGEVTVPANGK\mmVTMDVSQFT 779 

30 Query: 748 DMLKKVMPNGYFLEGYVRFTDPVDGG-EVLSIPYVGFHGEFQNLEVLEKSIYKLVANKEK 806 

h K MPNGY+LEG+VRF D D ++IP+VaFKB+F+NIi V E+SIY+L + + 

Sbjct: 780 KELTKQMENGYYLEGFVRFRDSQDIXSIilimWIPFVGPKGQFENLAVaEESIYRLKSQGKT 839 

Query: 807 GFYFQPK-QT^ffiVPGSEDYTALMTTSSEPIYSTDGTSPIQLKALGSYKSIDGKWILQLDQ 865 
35 GFYF +++ + +T L+T SE ST S L LG++K+ DGK+IL+ + 

Sbjct: 840 GFYFDESGPKDDIYVGKHFTGLOTLGSEmVSTKTISDNGLHTLGTFKNADGKFILEiaja 899 

Query: 866 KGQPHLAISE^^^DQNQDAVAVKGW]:lRNENlILRaKVYRADIJVNLQKPLWVSAPQ-AGDKN 924 

+G P LAISPN D NQD A KGVFIiR + L+A VY A D + PLWVS GDKN 
40 Sbjct: 900 QGNPVLAISPNGDNNQDFAAFKGVFLRKYQGLKASVYHASDKEHKNPLWVSPESFKGDKN 959 

Query: 925 YYSGNTENPKSTFLYDTEWKGTTTDGIPLEDGKYKYVLTYYSDVPGSKPQQMVFDITLDR 984 

+ S + KST L T + G+ G LD6Y YV++YY DV G+K Q+M FD+ IjDR 
Sbjct: 960 ENS-DIRFAKSTTLLGTAFSGKSLTGAELPDGHYHYWSYYPDVVGAKRQEMTFDMILDR 1018 

45 

Query: 985 QAPTLTTATYDKDRRIFKARPAVEHGESGIFREQVFYLKKDKDGHYNSVLRQQGEDGILV 1044 

Q P L+ AT+D + FK P + G +G+ ++ VFYL++ KD +V + V 

Sbjct: 1019 QKPVLSQATFDPETNRFKPEPriKDRGLAGTOKDSVFYLER-KDMKPVT\rriOT^ 1077 

50 Query: 1045 EDNKVFIKQEKDGSPILPKEViroFSHVYYTVEDYAiGNLVSaKLEDLINIGN^ 1104 

EDNK F++++ DGSFILP + YY VED+AGN+ AKL D + + +K+ 

Sbjct: 1078 EDNKTFVERQADGSFILPIJJKRKrflDFYYMVEDFAGMVaiAKLGDHIiPQTLGKTPIKLK^ 1137 

Query: 1105 FSPELNSNVDIDFSYSVKDDKGNIIKKQ HHGKDLNLLKLPFGTYTFDtFLYDEE 1158 

55, ++ + + ++Q H + +L DF+E 

Sbjct: 1138 TDGNYQTKETLKDNLEMTQSDTGLVTMQAQLAWHRNQPQSQLT KMNQDFFISPNE 1193 

Query: 1159 RfiNLISPKSVTOTISEKDSLKDVLFKWSTLLKKAALLVEFDKLLP KGATVQLVTKT 1213 

N K K+++ + Ii VN+ K + K P GA+V + T 

60 Sbjct: 1194 DON KDFVAFKGLKNimNDL-TVNVYAKD DHQKQTPIWSSQAGASVSAIEST 1244 

Query: 1214 )mVDLPKATYSPTDYGKNIPVGDYRUmX.PSGYSTLENLDDLLVSVKEDQVNLT--KL 1271 

A Y T G + GDY+ VT + E+ +SV + + +T + 

Sbjct: 1245 AWY6ITARGSKVMPGDYQYVVTYRDEHGK-EHQKQYTISVNDKKPMITQGRF 1295 



65 



Quejry: 1272 TLINK APLINAIAEQTDIITQPVPYNAGTHLKNimANLEKAQTLIKNRVEQTSID 1327 

IN P + + 1+ + VFY A KN + TD 
Sbjct: 1296 DTINGVDHFTPDOTKALDSSGIVREEVFYLA---KKNGRKFDVTEGKDGI TVSD 1346 
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Query: 1328 NAIAALRESRQALNGKETDTSLLAKAILAETEIKGNYQFVNASPL SQSTYIN 1379 

N++ + +DL+ +GNF L +N 

Sbjct: 1347 NKA7YIPKNPDGSYTISKRDGVTLSDYYYLVEDRAGNVSFATLRDLKAVGKDKAVVNFGLD 1406 

5 

Query: 1380 -QVQIJ«COT.LQKPim'QSEVDKMiElSrLDIAKNQ™GHETDYS--GLHm 1436 

V K ++ + + K +ENL+ NN Y + + NKS 

Sbjct: 1407 LPVPEDKQITOFTYLVRDjyDGICPIENLEYYMNSGNSLILPYGKYTVELLTYDTNAAKLES 1466 

10 Query: 1437 SKYQNASQFAKENYNKLIKKAELIJjSIsIR QATQAQVEELLNQIKATEQEL- 1485 

K + + A N+ + K +L +++ + ++ ++ +Q+ EQ L 

Sbjct: 1467 DKIVSFTLSADNNFQQVTFKITMLa.TSQITRHFDHr<LPEGSRVSLKTAQDQLIPLEQSLY 1526 

Query: 1486 DGRDRVSSAENYSQSIJSnairoSiaOTPINPPNQPQaLIFKKGMTKES 1531 

15 +G V + + N +NT P N ++ + K G +S 

Sbjct: 1527 VPKAYGKaT/QEGTYKVWSLPKGYRIEGNTKVimiP-NEraELSLRLVK^ 1585 

Query: 1532 EVAQKRVLGVTSQTDNQKVKTNKLPKTGESTPKITYTILLFSLSMLGLATI 1582 

+Q T LP TGE K+ + + L +LGL + 

20 Sbjct: 1586 KVMSKIMSQALTASATPTKSTTSATAKaLPSTGE---KMGLKrJlIVGLVLLGr,TCV 1638 

SEQ ID 8964 (GBS92) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 31 (lane 2; MW 48kDa). 

GBS92-His was purified as shown in Figure 199, lane 9 . 

25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2038 

A DNA sequence (GBSx2149) was identified in S.agalactiae <SEQ ID 6299> which encodes the amino 
acid sequence <SEQ ID 6300>. This protein is predicted to be AzlC family protein. Analysis of this protein 
30 sequence reveals the following: 

Possible site: 33 

»> Seems to have no N-terminal signal sequence 

35 



50 



INTEGRAL 


Likelihood 




-7 , 


.80 


Transmembrane 


212 - 


228 


( 


196 - 


230) 


INTEGRAL 


Likelihood 




-7. 


.27 


Transmembrane 


167 - 


183 


( 


159 - 


185) 


INTEGRAL 


Likelihood 




-5. 


.68 


Transmembrane 


189 - 


205 


( 


188 - 


210) 


INTEGRAL 


Likelihood 




-2. 


.28 


Transmembrane 


17 - 


33 


( 


13 - 


34) 


INTEGRAL 


Likelihood 




-1, 


.06 


Transmembrane 


135 - 


151 


{ 


135 - 


151) 


INTEGRAL 


Likelihood 




-1, 


.01 


Transmembrane 


61 - 


77 


( 


60 - 


77) 



40 Final Results 

bacterial membrane Certainty=0 .4121 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty^O. 0000 (Not Clear) < suco 

45 A related GBS nucleic acid sequence <SEQ ID 10235> which encodes amino acid sequence <SEQ ID 
10236> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



>GP:AAF10212 GB:AE001921 AzlC family protein [Dexnococous radiodurans] 
Identities = 72/224 (32%) , Positives = 117/224 (52%) , Gaps = 8/224 (3%) 

Query: 6 FKEGVKDALPTALGYISIGLAF6IVASASDLSAIEVGLMSALVYGGSAQFAMCALLLAKA 65 

F +G + +P LG + LA+ + A A+ LS + LMS + G++QFA L A A 
Sbjct: 7 FWQGFRALVPLWLGTVPFALAYAVTARAAGLSVGDTCLMSLTTFAGASQFAAAGLFGAHA 66 



55 



Query: 66 DLMTITMTVFLVNLRNMLMSLHATTIFKSAHLMNQLAIGTLITDESYGV-LLGEALHHKV 124 
++I +T FL+N R++L L + L ++ +TDE+YGV ++ A 
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Sbjct: 67 GGLSIVLTTFLIjNARHLLyGLSIjy«ELRLT-LPQRVVMi.QFLTDEAYGVAWSGARLPGG 125 

Query: 125 VSPSWMHGNNVMSYLTWISTIIGTLLGSTIPNPEMFGLDFALVAMFIGLFVFQLFGMLS 184 
++ +++ G + YL+W +ST++G h GS- +P PE G+ F+GL V ++ 
5 Sbjct: 126 LTFAPLLGAELSLYLSWWSTLIXSftliftGSVLPPPEQLGVGVVPPIAFriGLLV PLW 181 

Query: 185 DGKKLWra^SVGLSYFIjLATFLSGALSVLLATVVGCSVGVVI, 228 

D RL + V + GL + L+ L G L +LLA V G +G L 
Sbjct: 182 D--RLSLLVMiABGLa3WaLSRVLPGGLVILriaGViGGftLIiGaAL 223 

10 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2039 

15 A DNA sequence (GBSx2150) was identified in S.agalactiae <SEQ ID 6301> which encodes the amino 
acid sequence <SEQ ID 6302>. Analysis of this protein sequence reveals the foUowiag: 

Possible site: 60 

>» Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 . 3794 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

25 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

Example 2040 

30 A DNA sequence (GBSx2151) was identified in S.agalactiae <SEQ ID 6303> which encodes the amino 
acid sequence <SEQ ID 6304>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 . 5087 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

40 A related GBS nucleic acid sequence <SEQ ID 10233> which encodes amino acid sequence <SEQ ID 
10234> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04157 GB:AP001508 homosystein methyl transferase [Bacillus halodurans] 
Identities = 397/751 (52%) , Positives = 519/751 (68%) , Gaps = 14/751 (1%) 

45 

Query: 10 SNLGYPRLGEQREWKQAIEAFWAGNLEQKDLEKQLKQLRINHLKKQKEAGIDLIPVGDFS 69 

SNLGYPR+GE REWK+A+E+FWA + ++ L +K+LR+NHL+ Q+E +DLIPVGDF+ 
Sbjct: 4 SNLGYPRIGENREWKKALESFWflNDTTEEQLLATMKELRLNHLRVQQEQEVDLIPVGDFT 63 

50 Query: 70 CYDHVLDLSFQFNVlPKRFDEY--ERNLDLyFAIARGDKDNVASSMKKWEIimiYHYIVPE 127 

YDHVIjD++ F +IPKRF + L YFA+ARG K+ A M KW+NTNYHYIVPE 
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Sbjct: 64 LYDHVLDMAWFGIIPKRFLQQGDTPTLSTYFAMARGSKNAQACEMTKWYWTNYHYIVPE 123 

Query: 128 WEVETKPHLQ]!mLLDLYLESyiEWGDKaKPVITGPITYVSLSSGIVD--FEaTVQRLLP 185 

+ P L N L+ YLEA+ +G KPVT GP ++V L+ G + + T+Q LLP 
Sbjct: 124 LH-DAAPRLTKNAPLEAYLEAKNELGIDGKPVILGPYSFVKLAKGYEEDKLQETIQSLLP 182 

Query: 186 LYKQVFQDLIDAGATYIQIDEPIFVTDEGELLVDIAKSVYDFFAREVPQAHFIFQTYFES 245 

LY QV Q+L+DAGA IQ+DEP VT + + +Y+ + A QTYF++ 

Sbjct: 183 LYIQVIQELVimGARSIQVDEPSLVTSISAREMaLVTRIYEQINEAIADAPLPLQTYFDA 242 

Query: 246 AVCLDKLSKLPVTGFGLDFIHGRAENLRAVKQ-GLFREKELFAGIVHGRNIWAVN^ 304 

+++ LPV G GLDF+HG A+NL A++ G +K L AGI++GRNIW NL E 
Sbjct: 243 VTPYEEWSLFraSIGLDFVHGGAKNLEALRTPGFPEDKVLAAGIIDGKNIWISlTCiRE 302 

15 Query: 305 ALI,EEIGPFVK--RLTLQPSSSLLHVPVTTKYETHLDPVLKNGLSFADEKLKELELLASA 362 

L+ ++ V RL LQPS SLLHVPVTTK E LDP L L+FA+EKL EL L 
Sbjct: 303 ELVHQLEQHVAKDRLVLQPSCSLLHVPVTTKREEKLDPTLLGVLAFANEKLTELHTLKQL 362 

Query: 363 FDGNKTKGYHEALSR FSALQAADFRHVALESL-AEVKLERSPYKLRQALQAEKLQL 417 

20 GN+ + EAL +AL+ + +R A S E K + R+ LQ EK QL 

Sbjct: 363 AAGNEAE-VKXALEANDDALAALEKSGWRSQAATSHlSn^ENKKRPQSEl^RRPLQEEKWQL 421 

Query: 418 PILPTTTIGSFPQSPEIRKKRLAWKRGNLSDSDYKDFIKTEIRRWIAIQEDLDLDVLVHG 477 
P+LPTTTIGSFPQ+ ++R+ R W++G LS +Y+ +K+ I +WI IQE+L LDVLVHG 
25 Sbjct: 422 PLLPTTTIGSFPQTKDVRRTRSLWRKGELSTVEYERTMKSYIEKWINIQEELGLDVLVHG 481 

Query: 478 EPERVDMVEFFGQKLAGFTTTKLGWVQSYGSRAVKPPIIYGDVKHIQPLSLEETVYAQSL 537 

EFER DMVEFP6+KL GF T GWVQSYGSR VKPPIIYG+V +P+++ ETVYAQSL 
Sbjct: 482 EFER^ID^nmFFGEKLIX3FAFTANGWVQSYGSRCVKPPIIYGWSFTEPMTVAETVyAQSL 541 

30 

Query: 538 TKKPVKGMLTGPITITNWSFERDDISRSDLFNQIALAIKDEIQLLEQSGIAIIQVDEAAL 597 

T KPVKGMLTGP+TI NWSF RDD+ + + +QIA A+ E+ LE++GI +IQ+DE A+ 
Sbjct: 542 TDKPVKGMLTGPVTILNWSFVRDDLPLTVIAHQIAEALTHEVTALEEAGIEMIQIDEPAI 601 

35 Query: 598 REGLPLRQQKQQAYLDnAVAAPKIATSSVKDETQIHTHMCYSKFDEIIDSIRAIinADVIS 657 

REGLPL+ + QQ YLD AV+AF+ + + VK TQIHTHMCYS+F E+I++I LDADVIS 
Sbjct: 602 REGLPLKAEDQQEYLDWAVSAFEASCSfflVKATTQIHTHMCYSEFHBMIEAIDDLD^ 661 

Query: 658 lETSRSHGDIIESFETAVYPLGIGLGVYDIHSPRIPTKEEIIYNIQRSLKCLSKEQFWVN 717 
40 IETSRSHG++I +FE Y GIGLGVYDIHSPR+P++EE++ I+R+L L FWVN 

Sbjct: 662 lETSRSHGEMISAFEKTTYEKGIGLGVYDIHSPRVPSEEEMLNViRRALTVLPASLFWVN 721 

. Query: 718 PDCGLRTREEAETIAALEVLVSATKEVRQQL 748 
PDCXSLKTR E ET+aAL+ +V+A + R++L 
45 Sbjct: 722 PDCGLKTRAEKETVAALKNMVAAARAAREEL 752 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

50 Example 2041 

A DNA sequence (GBSx2152) was identified in S.agalactiae <SEQ ID 6305> which encodes the amino 
acid sequence <SEQ ID 6306>. This protein is predicted to be metH. Analysis of this protein sequence 
reveals flie following: 

Possible site: 20 
55 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0753 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

60 bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP -.6^05348 GB:APd01512 unknown conserved protein [Bacillus halodurans] 
Identities = 301/610 (49%) , Positives = 437/610 (71%) , Gaps = 9/610 (1%) 



Query: 


1 


MSKFLEPCLKTDILVADGaMGTLLYTYGLDTCHESYimHPEKVLaiHQAyiEftGADVIQT 


60 






M+ +E IiKT+ILV DGRMGTLLY G+D C E NVT PEK++A H AY+ERGftDVIQT 




Sb j ct : 


X 




60 


Query: 


61 


NTYGAQRHRLKNYGLEDQWSINQAAVNIAHQATLGKETFILGTVGGFRSQRQCDLTLDN 


120 






NTY A R +L Y L+DQV+ IN+AAV +A +A +ETF+LGT+GG RS + ++ + 




Sb j ct : 


51 




119 


Query: 


121 


IVEETLEQVEALIATGQLDGLLFETYYDIEEITTVLKIVRE^^ 


180 






+ + LEQ++AL++ G +DGLL ET+YD+EE + + R +TDLP+I ++S+ E GV 




OJJ J ^L- • 




VnTO7PT.T7nMWT!LTiVQTrri-\nY5TJT.T7TT?VTlT.PT?2ilfTtaVGT.aPQT.Tn^ 
V^l^VrijaS^r'iJIUuJVOlIiU^ VJJUJJ.LUJI2i if XJJUCiILnJNXjnVOJ-lcUXOJJxiJlJlr V XcUIJJ 




Query: 


181 


NGKPIVEALSQLVMLGADVIGLNCHLGPYHMIQSLKQVPLFAQSYLSVYPNASQLSLDGE 


240 






GK + ER IjGAD++G+Nr +GPY M++SIj4- V L ++Y S YHNRS □ 




Sb j ct : 


1 7Q 

X / 7 




236 


Query: 


241 


NSQYQFSQNSEYFGKSAELLVAEGVRLIGGCCGTTPDHIRAVKRSIRGLKPIERKVVTPI 


300 










St) j ct * 


237 


T^riPT,VVTTc;7appVTPYT?Mf5TrPT^7ririf^^7PT J.finPPriTTPT?TT\7^ - - 
UkJ*\.U X X ilOlN JrHi X £■ J, dl.''lOlSx\.r vyyi,j VX%.lJ±jUOV— ^^IjX i. riiri vXtfirrti\.V V iW3±Jl\.Jr V V OlN-ir vrC" 




Query: 


301 


IPVKDFVRRIRRT- - -DTLVDKVKKEVTIIAELDPPKHLDIVQFQKAIRAIDQKGIAAIT 


357 






T TiVT T T T XJ TlvVXvl^T TXX CtlJI-' E C X TX T \JT ATX 










JD*± 


Query: 


358 


I^NSLS]mlXa^^SIASU:lKDEISTPFIlLHIACRDHNLIGIlQSRIlLG^ffiL^ 


417 










OJJ J L^l^ • 




ManMQT.nQPPi/nMT.AT/3aTTnrvT\7naPDT.VH\7TPPnpTcn'.Tr!T.nQWT.MOT.waT^ 

iTuU'/lNOJJniSxrX^VX/lNlJnJJUnXXS^^^UVUcnn V>.»tUJXUVXlXUlJ^O£lXU*loXlIlAX)UlTlXX/XlLlnX 


/LI A 


Query: 


418 


TGDPTKlGDFPGATSVYDVTSFKLLSLIKQLNQGLSYSGBiSLRRPTDFTVAAAFNPNVKN 


477 






TGDPTK+GDPPGATSVYDVTSP+L+SLIKOUI+G+S+SG L + +P+V AAPHPNV++ 




r*1- • 
OJJJ ■ 




X UJ-Zf J. VVSUf JrUnlO V xUV XOf UJJXOXjXJ\\^UL>IJLV3XO£ OUl\J2iXJUI^JUUNf o vuuvulNirl^l vxul 


AHA 


Query: 


478 


LTRTVKLIEKKVaSGADYFMTQPIFDHSVLKEIJiDLTKTVEQPFFIGIMPITSYNt^^ 


537 






L R V+ +EKK+ +GADYFMTQPI++ ++++ + TK +E+P +IGIMP+ + NA FL 




Sb j ct : 


475 


LERAVQRMEKKIEAGADYFMTQPIYNEKQIEDIYEATKHIEKPIYIGIMPLINGRNAEFL 


534 


Query: 


538 


HNEVPGIKLSESFLSALEKVKDDKEACLTLALNESKSLIDEALNYFNGIYLITPFLRYDIi 


597 






HNEVPGIKL++ + + +D++ L +KSL+D A +YENGIYLITPFLRY + 




Sbjct: 


535 


HNEVPGIKLTDQIRERMARAGEDRQKGEREGLAIAKSLLDVATHYFNGIYLITPFLRYGM 


594 


Query: 


598 


TLELIDYIQK 607 








T++L Y+++ 




Sbj ct: 


595 


TVDLTHYVKE 604 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2042 

A DNA sequence (GBSx2153) was identified in S.agalactiae <SEQ ID 6307> which encodes the amino 
acid sequence <SEQ ID 6308>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.55 Transmembrane 127 - 143 ( 121 - 147) 
INTEGRAL Likelihood = -1.44 Transmembrane 157 - 173 ( 155 - 175) 

Pinal Results 

bacterial membrane Certainty=0. 4821 (Affirmative) < suco 
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bacterial outside Certaiiity=0 . 0000 (Not Clear) < succ> 

bacterial cytoplasm — Certainty=0. 0000 (Not- Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 1023 1> which encodes amino acid sequence <SEQ ID 
10232> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAC01354 GB:AL390975 putative integral membrane protein 
[Streptonyces coelicolor A3 (2) ] 
Identities = 38/98 (38%) , Positives = 59/98 (59%) 

Query: 113 RIADDVARFGGSWTFIIVFVSIMAIWMLVNIMKPFGIQFDPYPFILLNLALSTIAAIQAP 172 

R+++ VARF G+ FI+ +4- +W++ N+ P G-1-+FD YPFI h h hS A-t- AP 
Sbjct: 47 RLSERVARFLGTGRFIVWMTVVIILWVVWlWSAPSGIlREDEYPFIPLTL^lLSLQa^ 106 

15 Query: 173 LIMMSQNRAADYDRLQaRlTOFNVNKTSELEIRLLHEKI 210 

LI+++QNR D DR+ D N+ S + L +1 
Sbjct: 107 LILLAQNRQDDRDRVNLEQDRKQNERSIADTEYLTREI 144 

No corresponding DNA sequence was identified in S.pyogenes. 

20 Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8965> and protein <SEQ ID 8966> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
25 McG: Discrim Score: -3.84 

GvH: Signal Score (-7.5): -5.05 

Possible site: 53 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 2 value: -9.55 threshold: 0.0 
30 INTEGRAL Likelihood = -9.55 Transmembrane 127 - 143 ( 121 - 147) 

INTEGRAL Likelihood = -1.44 Transmembrane 157 -' 173 ( 155 - 175) 
PERIPHERAL Likelihood = 5.46 27 
modified ALOM score: 2.41 

35 *** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 4821 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

40 bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF01598(637 - 930 of 1341) 

GP|97l4438|eitib|CAC01354.l| |AL390975(47 - 144 of 198) putative integral membrane protein 

45 {Streptomyces coelicolor A3 (2) } 

%Match =8.2 

%Identity = 38.8 %Similarity = 61.2 

Matches =38 Mismatches =38 Conservative Sub.s = 22 

50 600 630 660 690 720 750 780 810 

MKEEEKPENVEERIJSIKQATIGQRIADDVARFGGSWTFIIVFVSIMAIWMLVNIMKPFGIQFDPYPFILLNIALSTIAAIQ 



55 



RLDQPRPPRRRLLPEWDPESFGRLSERVARFLGTGRFIVVMrVVIILWVVWNVSAPSGLRPDEYPFIFLTLMLSLQASYA 
40 50 60 70 80 90 100 

840 870 900 930 960 990 1020 1050 

APLimSQNRAADYDRLQBRNDFNVNKrSELEIRLLHEKIDHMVQQDQPELLEIQKLQTEMLVSLGNQLAQLKQLQK*SF 

I 11= | |: | : ) :| 

APLILLAQNRQDDRDRVNLEQDRKQNERSIADTEYLTREIAALRIGLGEVATRDWIRSELQDLVRDLEERQNGHHPDRGV 
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120 130 140 150 160 170 180 

SEQ ID 8966 (GBS393) was expressed in E.coli as a His-fiision product. SDS-PAGE analysis of total cell 
extract is shown in Figure 75 (lane 3; MW 30.8]cDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 177 (lane 4; MW 56kDa) and in Figure 
83 (lane 6; MW 56kDa). 

GBS393-GST was purified as shown in Figure 217, lane 5. 
Example 2043 

A DNA sequence (GBSx2154) was identified in S.agalactiae <SEQ ID 6309> which encodes the amino 
acid sequence <SEQ ID 6310>. Analysis of this protein sequence reveals the following: 
Possible site: 36 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.29 Transmetnbrane 274 - 290 ( 271 - 291) 

Final Results 

bacterial membrane Certainty^O. 2317 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AM)35508 GB:AE001721 glycerol dehydrogenase [Thermotoga maritima] 
Identities = 94/307 (30%) , Positives = 157/307 (50%) , Gaps = 21/307 (6%) 



Query: 


63 


VyGTDSTQSNIDKLVANPQVQAADAILGFGGGKRLDTAKMVRKELGKNSFTIPTICSNCS 


122 






++G + + I++L + + D ++G GGGK LDTAK VA +L K +PTI S + 




Sbjct: 


62 


IFGGECSDEEIERLSGLVE-EETDVWGIGGGKTLDTAKAVAYKLKKPWIVPTIASTDA 


120 


Query: 


123 


AGTAIAWYNDDHSFLRYGY-PESPLHIFINTRIIAQAPSKYFWAGIGDGISKAPEVERA 


181 






+A++V+Y + F RY + P +P + ++T I+A+AP+++ AG+GD ++ EE 




Sbjct: 


121 


PCSALSVIYTENGEFKRYLFLPRNPDVVLVOTEIVAKAPARFLVAGMGDALATWFEM 


180 


Query: 


182 


TLEAKTNKLPHT-AVLGQAVALSSKEAFYQFGEQGLKDVEANLASRAVEEI--ALDILIS 


238 






+ N ++ A+A E ++G + VE + A+E+I A +L 




Sbjct: 


181 


KQKYAPimGRIXSSMTAYAIJUUjCYETLLEYGVLAKRSVEEKSVTPALEKIV^^ 


240 


Query: 


239 


TGYASISrLVNQPDPYYNSCHAHAFyYGTTAIQRQGEFia3VVVRFGVLV-LBa.YFISffi^ 


297 






G+ S AHA + G T ++ ++LHG VAGVL L + + 




Sbjct: 


241 


LGFESG GLAAAHAIHNGLTVLENTHKYLHGEKVAIGVLASLFLTDKPRKMI 


291 


Query: 


298 


EKVARFNKSLGLPTTLADVSL SEKDIPKIVEIAMTTNE YKNTPFDPKMFAQAIL 


351 






E+V F + +GLPTTLA++ L S++D+ K+ E A NE + P K A+ 




Sb j ct : 


292 


EEVYSFCEEVGLPTTLAEIGLDGVSDEDLMKVAEKACDKNETIHNEPQPVTSKDVFFALK 


351 


Query: 


352 


AADAFGQ 358 








AAD -t-G-l- 




Sbjct: 


352 


AADRYGR 358 





There is also homology to SEQ ID 3078. 

SEQ ID 6310 (GBS123) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 29 (lane 7; MW 43.3kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 2044 

A DNA sequence (GBSx2155) was identified in S.agalactiae <SEQ ID 6311> which encodes the amino 
acid sequence <SEQ ID 6312>. Analysis of this protein sequence reveals the following: 

Possible site: 39 
5 >» Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0 . 0974 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6313> which encodes the amino acid 
sequence <SEQ ID 6314>. Analysis of this protein sequence reveals the following: 

Possible site: 17 
15 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2368 (Affirmative) < suco 

bacterial membrane Certaintyi=0. 0000 (Not Clear) < suco 

20 bacterial outside — Certainty^O. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 92/167 (55%) , Positives = 121/167 (72%) 



Query: 


1 


MKIAIIGYSGSGKSTLftRKIXSimNCNVIilLDSIHFAPNWEERKroDMIDDVSN^ 


60 






+KIAIIG+SGSGKSTIifiR LG +Y+C V HLD +HF+ NW+ER DMI D+S L K+ 




Sbjct: 


1 


LKIAIIGHSGSGKSTLaRPLGQHVHCEVFHLDQLHFSSNWQERSDHDMIADLSTCLLRQD 


60 


Query: 


61 


WI lEGNYKKLLYQERLaDADEI I FFDFNRFNCLWRAFKRYCKFRGKTRPDMANGCPEKLD 


120 






IIEGNY LY+ER+++AD 11+ +F+RF+C++RAFKRY +RGKTRPDMA+ C EK D 




Sbj ct : 


61 


LlIEGNYANCLYEERMSEADYIIYVNFSRFHCVYRAFKRYIiNYRGICrRPDMflDNCQEKFD 


120 


Query: 


121 


FEFISWILKDGRSDKQKSNYRQWEDYPQKIKILKHQRDLDQYLKEL 167 








F+ WIL DGRS Q Y+ W+ Y K +L +Q+ L Y+ + 




■ Sbjct: 


121 


VAFVKWIIiIiDGRSRHQLKKYQSVVQKYSHRriVLTNQKQIjSHYMNTI 167 




Based on 


this 


analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



vaccines or diagnostics. 



Example 2045 

40 A DNA sequence (GBSx2156) was identified in S.agalactiae <SEQ ID 6315> which encodes the amino 
acid sequence <SEQ ID 6316>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

»> Seems to have no N-terminal aigaal sequence 



45 Pinal Results 

bacterial cytoplasm Certainty=0 . 3874 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the GENPEPT database. 



>GP:CAA41941 GB:X59250 initiation factor IF-1 [Lactococcus lactis] 
Identities = 62/72 (86%) , Positives = 70/72 (97%) 



Query: 1 PIRKEDVIBIEGKVVETMPNAMFTTOLENGHQILATVSGKIRKNYIRILVGDRVTVEM 60 
55 MAK+DVIE++GKVV+TMPNftMFTVELENGHQ+LAT+SGKIRKNYIRIL GD+V VE+SPY 

Sbjct: 1 MAKDDVIEVDGKVVDTMPNAMPTVELENGHQVIjATISGKIRKNYIRILPGDKVQVELSPY 60 
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Query: 61 DLTRGRITYRFK 72 

DLTRGRITYRFK 
Sbjct: 61 DLTRGRITYRFK 72 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6317> which encodes the amino acid 
sequence <SEQ ID 6318>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3253 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 67/67 (100%) , Positives = 67/67 (100%) 

Query: 6 VIEIEGKWETMPNAMFTVELENGHQILATVSGKIRKNYIRILVGDRVTVEMSPYDLTRG 65 

VIEIEGKWETMPNAMFTVELENGHQILATVSGKIRKNYIRILVGDRVTVEMSPYDLTRG 
Sbjct: 1 VIEIEGKWETMPNftMFTVELENGHQILATVSGKIRKNYIRILVGDRVTVEMSPYDLTRG 60 

Query: 66 RITYRFK 72 

RITYRFK 
Sbjct: 61 RITYRFK 67 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2046 

A DNA sequence (GBSx2157) was identified in S.agalactiae <SEQ ID 6319> which encodes the amino 
acid sequence <SEQ ID 6320>. This protein is predicted to be adenylate kinase (adk). Analysis of this 
protein sequence reveals the following: 

Possible site: 17 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm' Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

,>GP:CAA41940 GB:X59250 adenylate kinase [Lactococcus lactis] 
Identities = 146/214 (68%) , Positives = 170/214 (79%) , Gaps = 6/214 (2%) 



Query: 


1 


MNLLIMGLPGAGKGTQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAKSYIDKGELVP 


60 






MNLLIMGLPGAGKGTQA IV+ +GV HISTGDMFRAAM N+TEMG+LAKS+IDKGELVP 




Sbjct: 


1 


mtjLIMGLPGaGKGTQAEFIVKNYGVNHISTGDMFRAaMKNETEMGKLAKSFIDRGELVP 


60 


Query: 


61 


DEVTNGIVKERIJffiDDIAEKGFI.LDGYPRTIEQaHaLDATLEELGLRLDGVINIIWDPSC 


120 






DEVTNGIVKERIA+DDI GFLLDGyPRTI+Q2iHaiiD LEELG++LD V+NI V+P+ 




Sb j ct : 


61 


DEVTNGIVKERLAQDDIKASGFLLDGYPRTIDQAHAIJJTMLEELGIKLDAVVNIVVNPNI 


120 


Query: 


121 


LIERLSGRIINRKTGETFHKVFNPPV DYKEEDYYQREDDKPETVKRRLDVNIAQ 


174 






L++RLSGR I R G T+HK+FNP D YQR DD PETVK RLDVNI + 




Sbjct: 


121 


LVDRLSGRYICRNCGATYHKIFNPTKVEGTCDVaSSHDLYQRflDDVPETVKNRLDVNIKE 


180 


Query: 


175 


GEPILEHYRKLGLVTDIEGNQEITEVPADVEKaL 208 





PI+EHY +IjGLV +IEG QEI++V D++K L 
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Sbjct: 181 SAPIIEHYTELGLVKNIEGEQEISQVTDDIKKVL 214 

A related DNA sequence was identified in S.pyogenes <SEQ ID 632 1> which encodes the amino acid 
sequence <SEQ ID 6322>, Analysis of this protein sequence reveals the following: 

5 Possible site: 17 

>>> Seems to have no N-tertninal signal sequence 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 208/212 (98%) , Positives = 212/212 (99%) 

15 

Query: 1 MNLLIMGLPGBiGKBTQAAKIVEEFGVaHISTGDMFRAaMaNQTEMGRLAKSYIDKGELVP 60 

MNLLI^raLPGa6KBTQaAKIVEEFG+AHISTGDMFRAaMaNQTEMGRLAKSYIDRGELTO 
Sbjct: 1 MNIilMGLPCSftGKBTQAAKIVEEPGIMISTGDMFRAftMfiNQTEMGRLAKS 60 

20 Query: 61 DEVTNGIVKERIiAEDDIAEKGFLLDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSC 120 

DEVTNGIVKERI^DDIAEKGFLIJDGYPRTIEQAHALDATLEELGLRLDGVINIKVDPSC 
Sbjct: 61 DEVTNGIVKERIAEDDIAEKGFLLDGYPRTIEQAHALiaTLEELGIJUiDGVINIKVDPSC 120 

Query: 121 LIERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVNIAQGEPILE 180 
25 L+ERLSGRIINRKTGETFHKVFNPPVDYKEEDYYQREDDKPETVKRRLDVN+AQGEPILE 

Sbjct: 121 LVERLSGRIINRKTGETFHKAreOTPVDYICEEDyYQREDDKPETVKERIiDVNMAQGEPILE 180 

Query: 181 HYRKLGLVTDIEGNQEITEVFADVEKALLELK 212 
HYRKLGLVTDIEGNQEIT+VFADVEKALLELK 
30 Sbjct: 181 HYRKLGLVTDIEGNQEITDVFADVEKALLELK 212 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8967> and protein <SEQ ID 8968> were also identified. Analysis of this 
35 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 0 
McG: Discrim Score: -1.04 
GvH: Signal Score (-7.5): -1.08 

Possible site: 17 
40 >>> Seems to have no N-terminal signal sequence 

ALOM program coxmt: 0 value: 6.79 threshold: 0.0. 
PERIPHERAL Likelihood = 6.79 106 
modified ALOM score: -1.86 

45 *** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certaintyi=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

over 213aa 

Lactococcus lactis 

55 EGAD 1 8612 1 adenylate kinase Insert characterized 

SP|P27143|KaD_LACLA ADENYLATE KINASE (EC 2.7.4.3) (ATP-AMP, TRSNSPHOSPHORYLASE) . Edit 

characterized 

GP|44074|emb|CAA41940.l| |X59250 adenylate kinase Insert characterized 
PIR|s17987|S17987 adenylate kinase (EC 2.7.4.3) - subsp. lactis Insert characterized 
60 PIR|B44812 |B44812 adenylate kinase (EC 2.7.4.3) - Insert characterized 
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10 



ORF01658{301 - 924 of 1236) 

EGAD|8612|8416 (1 - 214 of 215) adenylate kinase {Lactococcus lactis}SP | P27143 1 KAD_rjACLA 
ADENYLATE KINASE (EC 2.7.4.3) (ATP-AMP TRANSPHOSPHORYLASE) .GPj 44074 | etrib | CAA41940 . 1 1 |X59250 
adenylate kinase {Lactococcus lactis}PIR|S17987|S17987 adenylate kinase (EC 2.7.4.3) 
Lactococcus laotis svibsp. lactisPIR.lB44812lB44812 adenylate kinase (KG 2.7.4.3) 
Lactococcus lactis 
%Match =34.8 

%Identity = 69.5 %Siinilarity = 81. 0 

Matches = 146 Mismatches = 38 Conservative Svib.s = 24 



15 



132 162 192 222 252 282 312 342 

QaYSF*LQRVLKV*NNSRAIF*RDMLDS*IQQl!ilRI*VDSVNLLFCFLISPTCCW3PI*K^ 

IIIIMIIIIIIII 

MNLLIMQLPGAGKQ 
10 



372 402 432 462 492 522 552 582 

TQAAKIVEEFGVAHISTGDMFRAAMANQTEMGRLAiCSYIDKGELVPDEVTNGIVKERLaEDDIAEKGFLLDGYPRTIEQA 

20 III II: :|| llllllllllll |:|ll|:|ll|:|lllllllllllllllllllhlll I II I I I I I I I I : I I 

TQAEFIVKNYGVNHISTGDMFRAAMKIiffiTEMGKliAKSFIDRGELVPDEVTNGIVKERL^ 

30 40 50 60 70 80 90 



25 



612 642 672 702 732 

HALDATLEELGLRLDGVINIKVDPSCLIERLSXRIINRKTGETFHKVFNPP- 



774 804 

- - -VDY-KEEDYYQREDDKPETVKRRL 



30 



35 



40 



HALDTMLEELGIKLDAVVNIVWPNIL^roRLSGRYICRNCGATYHKIFNF^KVEGTCDVCGSHDLYQRADDVPETVKr^ 
110 120 130 140 150 160 170 

834 864 894 924 954 984 1014 1044 

DTOIAQGEPILEHYRKLGLVTDIEGNQEITEWM3VEKALLELK*IMLIYLHK*ISNDILS*SDL*LLPLYRGHQIEI*G 

ill! = Ihlll :|lll :!!! Il|::| l-l I 
DVNIKESAPIIEHYTELGLVKNIEGEQEISQVTDDIKKVLG 
190 200. 210 

SEQ ID 8968 (GBS114) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 29 (lane 9; MW 26.9kDa). 

The GBS114-His fusion product was purified (Figure 108 A; see also Figure 200, lane 8) and used to 
immunise mice (lane 1+2+3 product; 20|ig/mouse). The resulting antiserum was used for Western blot 
(Figure 108B), FACS (Figure 108C ), and in the in vivo passive protection assay (Table III). These tests 
confirm that the protein is immunoaccessible on GBS bacteria and that it is an effective protective 
immunogen. 



45 



50 



55 



Example 2047 

A DNA sequence (GBSx2158) was identified in S.agalactiae <SEQ ID 6323> which encodes the amino 
acid sequence <SEQ ID 6324>. This protein is predicted to be preprotein translocase secy subunit (secY). 
Analysis of this protein sequence reveals the following: 



Possible site: 35 

»> Seems to have an uncleavable N-term signal seg 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



14.01 
-8.65 
-6.16 
-5.36 
-3.93 
-3.03 
-2.55 
02 
0.64 



-2 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



217 
314 
369 
19 
180 
395 
151 
117 
270 



233 
330 



( 209 
( 307 



385 ( 363 - 



35 
196 



133 
286 



17 
179 



411 ( 392 
167 ( 151 

( 117 
( 269 



240) 
334) 
392) 
40) 
199) 
412) 
168) 
133) 
286) 



Final Results 
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bacterial membrane 
bacterial outside 
bacterial cytoplasm 



Certainty=0 . 6604 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty^O. 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9467> which encodes amino acid sequence <SEQ ID 9468> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA41939 GB:X59250 SecY protein [Lactococcus lactis] 
Identities = 292/433 (67%) , Positives = 361/433 (82%) , Gaps = 2/433 (0%) 

Query: 1 MFLKLLRDALK^K^WRNKILFTIFILLVERIGTHITVPGINVKSLEQMGELPFIlNMLNLV 60 

MF K L++A KVK VR +ILFTIFIL VFR+G HIT PG+NV++L+Q+ +LPFL+M+NLV 
Sbjct: 1 MFFKTLKEAFKVKDVRaRILPTIFILFVFRLGRHITAPGVNVQNLQQVaDLPFLSM^ 60 

Query: 61 SGNftMRNPSVFSMGVSPYITASIWQLLQMDII.PKFVEWGKQGEVGRRKIJ!IQATRYISLF 120 

SGNAM+N+S+F+MGVSPYITASI+VQLLQMDILPKFVEW KQGE+GRRKUSrQATRYI+L 
Sbjct: 61 SGNBMQNySLFAMGVSPYITASIIVQLLQMDILPKFTTBWSKQGEIGRRKiaiQATRYITLV 120 

Query: 121 LAFVQSIGITAGFNTLSSVALVKTPNVQTYIjLIGAILTTGSMWTWLGEQITDKGFGNGV 180 

LA QSIGITAGF +SS+ +V+ PN Q+YL+IG +LTTGSMVVTW+GEQI +KGFG+GV 
Sbjct: 121 LAmQSIGITAGFQAMSSIiNIVQNPNWQSYMIGVLLTTGSMWTWMGEQINEKGFGSGV 180 

, Query: 181 SMIIFAGIISSIPSAITTIYEDPFVNVRSSAITMSYIFVGILIVaVLAIVFFTTFIQQAE 240 
S+IIFftGI+S IPSAI ++Y++ F+NVR S I S+IFV LI++ + I++ TTF+QQRE 
Sbjct: 181 SVIIFAGrVSGIPSAIKSVYDEKFIiNVRPSEIPMSWIFVIGLiriSAIVIIYVTTFVQQaE 240 

Query: 241 YKIPIQYTKLVQGAPTSSYLPLKVNPAGVIPVIFASSITTIPSTI1PFFQ--NGKEIPWL 298 

K+PIQYTKL QGAPTSSYLPIi+VNPAGVIPVXFA SITT P+T1+ F Q G + WL 
Sbjct: 241 RKVPIQYTKLTQGAPTSSYLPLRVNPAGVIPVIFAGSITTAPATIIiQFLQRSQGSNVGWL 300 

Query: 299 TKLQELLNYQTPVOMIIYAILIIIiFSFFYTFVQVNPEKTAENLQKNSSYIPSIRPGRETE 358 



Sbjct: 301 STI<3^1ALSYTTl/m3MLFYALLIVLFTFFYSFVQVNPEKMAENLQKQGSYIP^^ 360 

Query: 359 EYMSSLLKKLATIGSVFLAFISLLPIIAQQALHLSSSIALGGTSLLILIATGIEGMKQLE 418 

+Y+S LL +LAT+GS+FL IS++PI AQ L +ALGGTSLLILI 1+ +KQLE 
Sbjct: 361 KYVSRLLMRLATVGSLFLGLISIIPIARQNVWGLPKIVRLGGTSLLILIQVAIQAVKQLE 420 

Query: 419 GYLLKRRYVGFMN 431 

GYLLKR+Y GFM+ 
Sbjct: 421 GYLLKRKYAGFMD 433 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3987> which encodes the amino acid 
sequence <SEQ ID 3988>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

»> Seems to have an uncleavable N-term signal seq 



INTEGRAL 


Likelihood 




■14, 


,70 


Transmembrane 


233 


- 249 


( 


226 


- 255) 


INTEGRAL 


Likelihood 




-8. 


.12 


Transmembrane 


330 


- 346 


( 


323 


- 350) 


INTEGRAL 


Likelihood 




-6. 


.10 


Transmembrane 


384 


- 400 


( 


378 


- 403) 


INTEGRAL 


Likelihood 




-5. 


.20 


Transmembrane 


35 


- 51 


( 


33 


- 56) 


INTEGRAL 


Likelihood 




-4. 


.09 


Transmembrane 


199 


- 215 


( 


195 


- 215) 


INTEGRAL 


Likelihood 




-3, 


.56 


Transmembrane 


167 


- 183 


( 


165 


- 184) 


INTEGRAL 


Likelihood 




-1. 


.65 


Transmembrane 


411 


- 427 


( 


411 


- 428) 


INTEGRAL 


Likelihood 




-1, 


.49 


Transmembrane 


133 


- 149 


( 


133 


- 149) 


INTEGRAL 


Likelihood 




-0. 


.64 


Transmembrane 


286 


- 302 


( 


285 


- 302) 



+ LQ L+Y T GM+ YA+LI+LF+FFY+FVQVNPEK AENLQK SYIPS+RPG+ TE 



Final Results 



bacterial membrane Certainty=0 . 6880 (Affirmative) < suco 

bacterial outside Certainty=0 . OOOO (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 377/434 (86%) , Positives = 417/434 (95%) 

Query: 1 MFLKIiRDALKVKMVimKIIjFTIFIIjIiVFRIGTHITVPGINVKSLEQMGELPFI^^ 60 

MPLK+L+DM.K+K VBNKi FTIPI+LVPRIGTHITVPG+N KSI1EQ+ ELPFUSIMIJOTiV 
Sbjct: 17 MFLKILKDALKIKTVKNKIFFTIFIILVFRIGTHIWPGVNAKBLEQLSELPFW^ 76 

Query: 61 SGNAMRNFSVFSMGVSPYITASIWQLLQMDILPKFVEWGKQGEVGRRKLNQATRYISLF 120 

SGNAMRNFSVFSMGVSPYITASIWQLLQMDILPKFVEWGKQGEVGRRKLNQATRYISL 
Sbjct: 77 SGNAMRNFSVFSMGVSFYIlMIWQLLQMDILPKFVEWGKQGEVGRRKIiNQATRYISLV 136 

Query: 121 LAFVQSIGITAGFm'LSSVaLVKTEOTQTYLLIGAILTTGSMVVTWLGEQITDRGPGNGV 180 

LAF QSIGITAGFOTLS+VALVlCTP+++TYLLIGa+LTTGS++VTWLGEQITDKGFGNGV 
Sbjct: 137 LAFAQSlGITOGFOTTiSNVaLVKTPDIKTYLLlGRIiLTTGSVIVTWI/^ 196 

Query: 181 SMIIFAGIISSIPSAITTIYEDFFVNVRSSAITNSYIFVGILIVAVLAIVFFTTFIQQAE 240 

SMIIFAGIISSIPSAI TI ED+FVNV++S + +SY+ VGILI+AVLAIVFFTT++QQAE 
Sbjct: 197 SMIIFAGIISSIPSAIATIREDYFVNVKASDLHSSYLIVGILIIAVLAIVFFTTYVQQAE 256 

Query: 241 YKIPIQYTKLVQGAPTSSYLPLKUNPAGVTPVIFASSITTIPSTIIPFFQNGKEIPWLTK 300 

YKIPIQYTKL+QGAPTSSYLPLKVNPAGVIPVIFASSITTIPSTIIPF QNG+++PML + 
Sbjct: 257 YKIPIQYTKMQGAPTSSYLPLKVNPAGVIFVIFASSITTIPSTIIPFVQNGRDLPWIJSIR 316 

Query: 301 LQELLNYQTPVGMIIYAILIILFSFFYTFVQVNPEKTAENLQKNSSYIPSIRPGRETEEY 360 

LQE+ NYQTPVGMI+YA+LIILFSFFYTFVQVNPEKTAENIiQKNSSYIPS+RPGRETE++ 
Sbjct: 317 LQEIFHYQTPVGMIVYALLIILPSFFYTFVQVNPEKTAENLQKNSSYIPSVRPGRETEQF 376 

Query: 361 MSSLLKKIATIGSVFLAFISLLPIIAQQALHLSSSIAIflGTSIJliILIATGIEGMRQLEGY 420 

MS+LLKKLAT+G++FLAPISI1 PI AQQAL+LSSSIALGGTSLLILI+TGIEOMKQLEGY 
Sbjct: 377 MSALLKKLATVGAIPLAFISLAPIAAQQAimiSSSIALGGTSLLILISTGlEGMKQIiEGY 436 

Query: 421 LLKRRYVGFMNTTE 434 

LLKR+YVGFMNT E 
Sbjct: 437 LLKRKYVGFMNTAE 450 

A related GBS gene <SEQ ID 8969> and protein <SEQ ID 8970> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: 6.16 
GvH: Signal Score (-7.5): -4.32 

Possible site: 35 
>» Seems to have an imcleavable N-term signal seq 
ALOM program count: 9 value: -14.01 threshold: 0.0 
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Likelihood 
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209 
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Likelihood 




-9, 


.98 


Transmembrane 


311 


- 327 


( 


307 


- 334) 


INTEGRAL 


Likelihood 




-6. 


.16 
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modified ALOM score: 3.30 



*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 6604 (Affirmative) < suco 

bacterial outside — Certaintyi=0.0000(Not Clear) < suco 
bacterial cytoplasm Certainty^O . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORP01657(301 - 1596 of 1902) 

EGAD 1 6545 1 6344(1 - 434 of 439) preprotein translocase secy subunit {Lactococcus lactis} 
SP|P27148|SECY_LRCLA PREPROTEIN TRRNSLOCSiSE SECY SUBUNIT. GP 1 44073 | enib| CAA41939 . 1 1 | X59250 
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SecY protein {Lactococcus lactis} PIR|s17985 |S17985 preprotein translocase secY 
Lactococcus lactis subsp. lactis 
%Match =46.6 

%Identity =67.0 %Similarity =84.1 
5 Matches = 290 Mismatches = 68 Conservative Sub.s = 74 

72 102 132 162 192 222 252 282 

HQCKRICSCEP*PIKCL*RWY*SNSSCS*RSWiqRAC*KIRR*NSW*W*IN*EIVC*SS*IF*IC*SSYHC*RWENRSHLI 



10 312 342 372 402 432 462 492 522 

NER*LIMFLKLLRDALKVKMVRNKILFTIFILLVFRIGTHIWPGINVKSLEQMGELPF1:jMMIiNLVSGNAMRNFSV 

\----\--\\\ II Ollllllbllhl III lhlh:|:|: : I I I I = I = I I I I I I I I : I : I : I : I I 
MFFKTLKEAFKVKDVRARILFTIFILFVFRLGAHITAPCSVNVQNLQQVa^ 

10 20 30 40 50 60 70 

15 

552 582 612 642 672 702 732 762 

VSPYITASIVVQLLQMDILPKFVEWGKQGEWSRRKIiNQATRYISLFLaFVQSIGITAGFOT'LSSVaLVKTEIW^ 

llllllllhllliillllllllli IIIMIIIIIIIIIII:] 11 Hlllllll :||: :|: II MMI 
VSPYITASIIVQLLQMDILPKFVEWSKQGEIGRRKLNQATRYITLVLAMAQSIGITAGFQftMSSLNIVQNPNWQSYLMIG 
20 90 100 110 120 130 140 150 



792 822 852 882 912 942 972 1002 

AILTTGS^m7TWLGEQITDRGFGNGVSMIIFAGIISSIPSAITTIYEDFFVNVRSSAITNSYIFVGILIVAVLAIVFFTT 

=lllllllll|:|||| =llll=ll|:|lllll=l Mill ::|:: 1=111 I I Mil 11== = I- || 
25 VliTTGSMVVTVmiGEQINEKGFGSGVSVIIFAGIVSGIPSAIKSVYDEKFUWRPSEIPMSWIFVlGniLSAIVIIYVTT 
170 180 190 200 210 220 230 



1032 1062 1092 1122 1152 1176 1205 1236 

FIQQAEYKIPIQYTKLVQGAPTSSYLPLKVNPAGVIPVIFASSITTIPSTIIPFFQ--NGKEIPWLTKLQELLNYQTPVG 

30 Willi hlllllll llllllllllhllllllllllll Mil hlh hi I : Ih II hi I I 

FVQQAERKVPIQYTKLTQGAPTSSYLPLRWPAGVIPVIFAGSITTAPATIIiQFLQRSQGSNVGWLSTI^JNM 

250 260 270 280 290 300 310 

1266 1296 1326 1356 1386 1415 1445 1476 

35 MIIYAILIILFSFFXTFXQVNPEKTAENLQKNSSYIPSIRPGRETEEYMSSLLKKLATIGSVFLAFISLLPIIAQQALHL 

h Ihlhlhll :i llllll mill llllhllh ll:hl II :|lhlhll Hi-ll 11 I 
MLFYALLIVLFTFFYSFVQWPEKMAEMLQKQGSYIPSVRPGKGTEKYVSRLLMRIATOGSLFL^^ 

330 340 350 360 370 380 390 

40 1506 1536 1566 1596 1626 1656 1686 1716 

SSSIALGGTSLLILIATGIEGMKQLEGYLLKRRYVGFMNTTE*NIG*LCQPSILFETSKSDMLCWIYLKTK*GDYlffiSENY 

:iiiiimm h :imiiiiihi iih 

PKIVALGGTSIiLILIQVAIQAVKQLEGYLLKRKYAGFMDNPLETK 
410 420 430 

45 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2048 

A DNA sequence (GBSx2159) was identified in S.agalactiae <SEQ ID 6325> which encodes the amino 
50 acid sequence <SEQ ID 6326>. This protein is predicted to be 508 ribosomal protein L15 (rplO). Analysis 
of this protein sequence reveals the following: 

Possible site: 59 

»> Seems to have no N-terminal signal sequence 

55 Pinal Results 

bacterial cytoplasm Certainty^O. 5259 (Affirmative) < suco 

bacterial membrane Certaintyi=0. 0000 (Not Clear) < suco 

bacterial outside Certaintyi=0. 0000 (Not Clear) < suco 



60 The protein has homology with the following sequences in the GENPEPT database. 



>GP:AAB54021 GB:U96620 ribosomal protein LIS [Staphylococcus aureus] 
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Identities = 116/146 (79%) , Positives = 128/146 (87%) 

Query: 1 MKLHELKPJffiGSRKSWHRVGRGTSSGNGKTSGRGQKGQKSftSGGGVRIiGFEGGQTPIjFRR 60 

MKLHBLKPAEGSRK RISIRVGRG ++GN6K:TSGRG KGQKARSGGGVR GFEGGQ PLFRR 
Sbjct: X MKLHELKPAEGSRKERNRVGRGVATGNGKTSGRGHKGQKRRSGGGVRPGFEGGQLPLFRR 60 

Query: 61 MPKRGFSNINAKEYMjWLDQLNVFEDGTEVTPWLKEAGIVRAEKSGVKILGNGELTKK 120 

+PKRGF+NIN KEYA+VNLDQUJl FEDGTEVTP +L E+G+V+ EKSG+KILGNG L KK 
Sbjct: 61 LPKRGFTNIlTOKEYAIVNLDQMIKPEIXSTEVTPALLVESGVVKIffiKSG 120 

Query: 121 LSYKAAKPSKSAEftAITAKiGGSIEVI 146 

L+VKA KFS SA AI AKGG+ EVI 
Sbjct: 121 LTVKAHKFSASAAEAIDAKGGAHEVI 146 

15 A related DNA sequence was identified in S.pyogenes <SEQ ID 6327> which encodes the amino acid 
sequence <SEQ ID 6328>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

»> Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 .5329 (Affirmative) < suco 

bacterial membrane Certainty=0. GOOD (Not Clear) < succ> 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 135/146 (92%) , Positives = 142/146 (96%) 

Query: 1 MKLHELKPAEGSRKVRNRVGRGTSSGNGKTSGRGQKGQKaRSGGGVRLGFEGGQTPLFRR 60 
MKLHELK AEGSRKVRITOVGRGTSSGjSIGKTSGRGQKGQKARSGGGVRLGFEGGQTPLFRR 
30 Sbjct: 1 MmffiLKAAEGSRKVRNRVGRGTSSGNGKTSGRGQKGQKARSGGGVRLGFEGGQTPLFRR 60 

Query: 61' MPKRGFSNINAKEYALVNLDQIjmEDGTEVTPVVLKEAGrVRAEKSGVKII,^ 120 

+PKRGF+NIN KEYALVNLDQLNVF+DGTEVTP +LK+AGIVRAEKSSVK+LGNGELTKK 
Sbjct: 61 IPKRGFTNINTKEYAL\7NIjDQIjmT)DGTEVTPAILKDRGIVRAEKSGVKVL^ 120 

35 

Query: 121 LSVKAAKFSKSAEAAITAKGGSIEVI 146 

L+VKAAKFSKSAEAAI AKGGSIEVI 
Sbjct: 121 LTVKARKFSKSAEBiAIlAKSGSIEVI 146 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2049 

A DNA sequence (GBSx2160) was identified in S.agalactide <SEQ ID 6329> which encodes the amino 
acid sequence <SEQ ID 6330>. Analysis of this protein sequence reveals the following: 

45 Possible site: 53 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 1162 (Affirmative) < suco 
50 bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB54020 GB:D96620 ribosomal protein L30 [Staphylococcus 
55 aiireus] 

Identities = 40/58 (68%) , Positives = 46/58 (78%) 

Query: 1 MAQIKITLTKSPIGRKPEQRKTVVALGLGKLMSSVVKEDNAAIRGMVNAISHLVTVEE 58 
MA+++ITLT+S IGR QRKTV ALGL K NSSW EDN AIRG +N + HLVTVEE 
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Sbjct: 1 MAKLQITLTRSVIGRPETQRKTVEAIfiLKKraSSVVVEDNPAIRGQINKVK^^ 58 

A related DNA sequence was identified in S. pyogenes <SEQ ID 633 1> which encodes the amino acid 
sequence <SEQ ID 6332>. Analysis of this protein sequence reveals the following: 

5 Possible site: 53 

»> Seems to have no N-terminal signal secpience 

Final Results 

bacterial cytoplasm Certainty=0 . 1088 (Affirmative) < suco 

10 bacterial raembrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 56/58 (96%) , Positives = 57/58 (97%) 

15 

. (3uery: 1 MAQIKITLTKSPIGRKPEQRKTWALGLGKLNSSVVKEDNaaiRGMVNAISm 58 
MAQIKITLTKSPIGRKPEQRKTVVRLGLGKLNSSVVKEDNAAIRGMV AISHr,VTVE+ 
Sbjct: 1 MRQIKITLTKSPIGRKPEIQRKTVVRLGIiQKIiNSSVVKEDNRRIRGNIVTAISHLVT^ 58 

20 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2050 

A DNA sequence (GBSx2161) was identified in S.agalactiae <SEQ ID 6333> which encodes the amino 
acid sequence <SEQ ID 6334>. Analysis of this protein sequence reveals the following: 

25 Possible site: 36 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty^O .3226 (Affirmative) < succ> 

30 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty^O. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

35 Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

Example 2051 

A DNA sequence (GBSx2162) was identified in S.agalactiae <SEQ ID 6335> which encodes the amino 
acid sequence <SEQ ID 6336>. This protein is predicted to be 30S ribosomal protein S5 (rpsE). Analysis of 
40 this protein sequence reveals the following: 

Possible site; 26 

»> Seems to have no N-terminal signal secjuence 

Final Results 

45 bacterial cytoplasm Certainty=0. 3179 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty^O. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

50 >GP:AAA22699 GB:M57621 ribosomal protein S5 [Bacillus 

stearothermophilus] 
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Identities = 119/158 (75%) , Positives = 139/158 (87%) 

Query: 6 NAVELEERWAINROTKOTKGGRIOJlFAaLWVGDRNGRVGFGTGKAQEVPEAIRKAVEA 65 

N +ELEERWA+NRV KWKGGRRLRP+ALVWGD+NG VGFGTGKAQEVPEAIRKR+E 
Sbjct: 7 NKLELEBRWAVmVAKVVRGGRRLRPSALVVVGDKNGHVGFGTGKRQEVPEAIRKAIED 66 

Query: 66 AKKNMVEVPMVGTTIPHEVRSEFGGAKVLLKPAVEGAGVaAGGAVRAVIELAGVADITSK 125 

AKKN++EVP+VGTT1PHEV FG +++LKPA EG GV AGG RAV+ELAG++DI SK 
Sbjct: 67 AKKNLIEVPIVGTTIPHEVIGHFGAGEIILKPASEGTGVIAGGPARAVLELAGISDILSK 126 

Query: 126 SLGSNTPINIVRATVEGLKQIiKRAEEVAALRGISVSDL 163 

S+GSNTPIN+VRAT +GLKQLKRaE+VA LRG +V +l' 
Sbjct: 127 SIGSOTPIiroiVRATPDGLRQLKRAEDVAKLRGRrVEEL 164 

15 A related DNA sequence was identified in S.pyogenes <SEQ ID 6337> which encodes the amino acid 
sequence <SEQ ID 6338>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>» Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 . 3179 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 158/164 (96%) , Positives = 161/164 (97%) 

Query: 1 MAFKDNAVELEERWAINRVTKWKGGRRLRFAALWVGDRNGRVGFGTGKAQEVPEaiR 60 
MAFKDNAVELEERWAINRVTKWKGGRRLRFAALVWGD NGRVGFGTGKAQEVPEAIR 
30 sbjct: 1 MAFKENAVELEERVVAINRVTKVVKGGRRLRFAALVVVGDGNGRVGFGTGKAQEVPEAIR 60 

Query: 61 KAVEAAKKNMVEVPMVGTTIPHETOSEFGGAKVLLKPAVEGAGVAAGGAVRAVIELaGVA 120 

KAVEAAKKNM+EVPMVGTTIPHEV + FGGAKVLLKPAVEG+GVAAGGAVRAVIELAGVA 
Sbjct: 61 KAVEAAKKNMIEVPMVGTTIPHEWTNFGGAKVLLKPAVEGSGVAAGGAVRAVIELAGVA 120 

35 

Query: 121 DITSKSLGSNTPINIVRATVEGLKQLKRAEEVAALRGISVSDLA 164 

DITSKSLGSNTPINIVRATVEGLKQLKRAEEVAAQRGISVSDLA 
Sbjct: 121 DITSKSLGSNTPINIVRATVEGLKQLKRAEEVARLRGISVSDIiA 164 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2052 

A DNA sequence (GBSx2163) was identified in S.agalactiae <SEQ ID 6339> which encodes the amino 
acid sequence <SEQ ID 6340>. This protein is predicted to be SOS ribosomal protein LI 8 (rplR). Analysis 
45 of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 4488 (Affirmative) < suco 

bacterial membrane Certainty^O . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9465> which encodes amino acid sequence <SEQ ID 9466> 
55 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:AaB06815 GB:Ii47971 ribosomal protein L18 [Bacillus subtilis] 
Identities = 86/120 (71%) , Positives = 97/120 (80%) , Gaps = 2/120 (1%) 

Query: 4 VISKPDKNKIRQKMIRRTOGKLSGTiUSRPRLNIFRSNTGIYAQVIDDVaGVTIiASASTIiD 63 

+I+K KN R KRH RVR KIiSGTA+RPRIiN+PRSN IYAQ+ID0V GVTLASASTLD 
Sbjct: 1 MITKTSKNaaRLKRHARVRAKLSGTTiERPRIOTFRSNKHIYAQlIDDVNGVTLASaSTLD 60 

Query: 64 KE--VSNGTKTEQAVWGKLVAERAVAKGISEWFDRGGYLYHGRVKAIiADSARENGLKF 121 

K+ V 4- T A VG+LVA+RA KGIS+WFDRGGYLYHGRVKALAD+ARE GLKF 
Sbjct: 61 KDLIWESTGDTSAATKVGELVAKRAAEKGISDVVFDRGGYLYHGRVKALftDAAREAGLKF 120 

A related DNA sequence was Identified in S.pyogenes <SEQ ID 6341> which encodes the amino acid 
sequence <SEQ ID 6342>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4488 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 116/121 (95%) , Positives = 120/121 (98%) 

Query: 1 MKIVISKPDKNKIRQKRHRRVRGKLSGTADRPRLNIFRSNTGIYAQVIDDVAGVTLASAS 60 

+KIVISKPDKNKIRQKRHRRVRGKLSGTADRPRLN+FRSNTGIYAQVIDDVAGVTLASAS 
, Sbjct: 1 WIVISKPDJCKTKIRQKRHRRTOGKLSGTADRPRIOTFRSNTGIYAQVIDDVAGVTLASAS 60 

Query: 61 TLDKEVSNGTKTEQAVWGKLVAERAVAKGISEWFDRGGYLYHGRVKALADSARENGLKF 121 

TLDK+VS GTKTEQAVWGKLVAERAVAKGISEWFDRGGYLYHGRVKALAD+ARENGLKF 
Sbjct: 61 TLDKDVSKGTKlEQAVWGKLVAEa^AVARGISEVVFDRGGYLYHGRVKALRDAARENGLKF 121 

Based on this analysis, it was predicted that tiiis protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2053 

A DNA sequence (GBSx2164) was identified in S.agalactiae <SEQ ID 6343> which encodes the amino 
acid sequence <SEQ ID 6344>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no H-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1530 (Affirmative) < suco 

bacterial membrane Certaintyi=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA22700 GB;M57622 ribosomal protein L6 [Bacillus 
stearothermophilus] 
Identities = 108/178 (60%) , Positives = 133/178 (74%) 

Query: 1 MSRIGNKVITLPAGVEIINKDIWVTVKGPKGQLTREFNKNIGITVEGTEVTVTRPNDSKE 60 , 

M R+G K I +PAGV + N VTVKGPKG+LTR F+ ++ ITVEG +TVTRP+D K 
Sbjct: 1 MXRVGKKPIEIPAGVT\mWGWTVTVKGPKGELTRTFHPDOTITVEGNVITVTRPSDEKH 60 

Query: 61 MKTIHGTTRANIMJMWGVSEGFKKALEMRGVGYRAQLQGSK1.VLSVGKSHQDEVEAPEG 120 

+ +HGTTR+ L NMV GVS+G++KALE+ GVGYRA QG KLVLSVG SH E+E EG 
Sbjct: 61 HRALHGTTRSLLANMVEGVSKGYEKALELVGVGYRASKQGKKLVLSVGYSHPVEIEPEEG 120 

Query: 121 VTFEVPTPTTINVIGINKESVGQTAAYVRSLRSPEPYKGKGIRYVGEFVRRKEGKTGK 178 
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+ EVP+ T I V G +K+ VG+ +R++R PEPYKGKGIRY GE VR KEGKTGK 
■ Sbjct: 121 LEIEVPSQTKIIVKGftDKQRVGELAftNIRAVRPPEPyKGKGIRYEGELVRLKEGKTGK 178 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6345> which encodes the amino acid 
5 sequence <SEQ ID 6346>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

»> Seems to have no N-terminal signal sequence 

Pinal Results 

10 bacterial cytoplasm Certaintyi=0. 1704 (Affirmative) < suco 

bacterial membrane — Certainty=0.0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

15 Identities = 153/178 (85%) , Positives = 166/178 (92%) 



20 



Query: 1 MSRIGNKVITLPAGVEIINKDNVVTVKGPKjt3QLTREFNKNIGITVEGTEV^VIl^E^^ 60 

MSRIGNK:VIT+PAGVE+ N +W+TVKGPKB+LTREFNKNI I VEGTE+TV RPNDSKE 
Sbjct: 1 MSRIGNKVITMPAGVELTNNNN^/ITVKGPKGELTREENKNIEIKVEGTEITVVRPNDSKE 60 

Query: 61 MKTIHGTTRANLNNMWGVSEGFKKALEMRGVGYRAQLQGSKLVLSVGKSHQDEVEAPEG 120 

MKTIHGTTRANLNNMWGVSEGFKK LEM+GVGYRAQLQG+KLVLSVGKSHQDEVEAPEG 
Sbjct: 61 MKTIHGTTRANLNSlMWGVSEGFKKDLEMKiGVGYRAQLQGTKLVLSVGKSHQDEVEAPEG 120 

25 Query: 121 VTFEVPTPTTINVI6INKESVGQTAAYTOSLRSPEPYKGKGIRYVGEFVRRKE6KTGK 178 

+TF V PT+I+V GINKE VGQTAAY+RSLRSPEPYKGKGIRYVGE+VR KEGKTGK 
Sbjct: 121 ITFTVANPTSISVEGINKEVVGQTARYIRSIJlSPEPYKGKGIRYVGEYVRLKEGKrGK 178 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
30 vaccines or diagnostics. 

Example 2054 

A DNA sequence (GBSx2165) was identified in S.agalactiae <SEQ ID 6347> which encodes the amino 
acid sequence <SEQ ID 6348>. This protein is predicted to be 308 ribosomal protein S8 (rpsH). Analysis of 
this protein sequence reveals the following: 

35 Possible site: 19 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 4356 (Affirmative) < suco 

40 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certaintyi=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB06813 GB:L47971 ribosomal protein S8 [Bacillus subtilis] 
45 Identities = 100/132 (75%) , Positives = 116/132 (87%) 

Query: 1 MVMTDPIADFLTRIRNANQAKHEVLEVPASNIKKGIADILKREGFVKNVEVIEDDKQGII 60 

MVMTDPIAD LTRIKNAN +HE LE+PAS +K+ IA+ILKREGF+++VE +ED KQGII 
Sbjct: 1 MV>m)PIADMLTRIRNM]MVRHEKLEIPASKLKREIAEILKREGFIRDVEFVEDSKiQGII 60 

50 

Query: 61 RVFLKYGQNGERVITNLKRISKPGLRVYTKHEDMPKVLNGLGIAIVSTSEGLLTDKEARQ 120 

RVFLKYGQN ERVIT LKRISKPGLRVY K ++P+VLNGLGIAI+STS+G+LTDKEAR 
Sbjct: 61 RVFLKYGQNNERVITGLKRISKPGLRVYAKSNEVPRVLNGLGIAIISTSQGVLTDKEARA 120 

55 Query: 121 KNIGGEVLAYIW 132 

K GGEVLAY+W 
Sbjct: 121 KQAG6EVLAYVW 132 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 6349> which encodes the amino acid 
sequence <SEQ ID 6350>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

»> Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm — Certainty=0. 4327 (Affirmative) < suco 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 122/132 (92%) , Positives = 129/132 (97%) 



Query: 


1 


MVMTDPIADFLTRIRNMQAKHEVLEVPASNIKKGIADILKREGFVKNVEVIEDDKQGII 60 






MVMTDPIADFLTRIRNflNQ KHEVLEVPASNIKKGIA+ILKREGFVKNVEVIEDDKQGII 


Sbjct: 


1 


MVMTDPIADFLTRIRNANQVKHEVLEVPASNIKKGIAEILKREGFVKNVEVIEDDKQGI 160 


Query: 


61 


RVFLKYGQNGERVITNLKEISKPGLRVYTKHEDMPKVLNGLGIAIVSTSEGIiLTDKEARQ 12 0 






RVFLKyG+NGERVITNLKRISiCPGLRVY K +DMPKVIiNGrjGIAI+STSEGLLTDKEftRQ 


Sbjct: 


61 


RVFLKYGKNGERVITiaKRISKPGLRVYAKRDDMPKViaiGLGIAIISTSEGLLTDKEaRQ 12 0 


Query: 


121 


KNIGGEVLAYIW 132 






KN+GGEV+AY+W 


Sb j ct : 


121 


KNVGGEVIAYVW 132 


Based on 


this 


analysis, it was predicted that this protein and its epitopes, could be useftd antigens for 



vaccines or diagnostics. 



Example 2055 

A DNA sequence (GBSx2166) was identified in S.agalactiae <SEQ ID 635 1> which encodes the amino 
acid sequence <SEQ ID 6352>. This protein is predicted to be ribosomal protein S14 (rpsN). Analysis of 
this protein sequence reveals the following: 

Possible site: 59 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Gertaiiity=0 . 3833 (Affirmative) < suco 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certaxnty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>6P:CAB11905 GB:Z99104 ribosomal protein S14 [Bacillus subtilis] 
Identities = 47/61 (77%) , Positives = 53/61 (86%) 

Query: 1 MAKKSMIAKNKRPAKFSTQAyTRCEKCGRPHSWRKFQLCRVCFRDIAYKGQVPGVTKAS 60 

MAKKSMIAK +R KF Q YTRCE+CGRPHSV RKF+LCR+CFR+LAYKGQ+PGV KAS 
Sbjct: 1 MAKKSMIAKQQRTPKFKVQEYTRCERCGRPHSVIRKFKLCRICFREIAYRGQIPGVKKAS 60 

5 

Query: 61 W 61 
W 

Sbjct: 61 W 61 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6353> which encodes the amino acid 
sequence <SEQ ID 6354>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

»> Seems to have no N-terminal signal sequence 
Final Results 
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bacterial cytoplasm 
bacterial membrane 
bacterial outside 



Certainty=0. 4747 (Affirmative) < suco 
Certaiiity=0.0000(Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 55/61 (90%) , Positives = 59/61 (96%) 

Query: 1 MAKKSMIAKNKRPAKFSTQAYTRCEKCGRPHSVYRKFQLCRVCFRDIAYKGQVPGVTKAS 60 

+AKKSMIAKNKRPAK STQAYTRCKKCGRPHSVYRKF+LCRVCFR+LAYEGQ+PGV KAS 
Sbjct: 1 lAKKSMIAKNKKPAKHSTQAYTRCEKOSRPHSVYRKFKriCRVCFRELAYKGQIPGVV^ 60 

Query: 61 W 61 
w 

Sbjct: 61 W 61 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2056 

A DNA sequence (GBSx2167) was identified in S.agalactiae <SEQ ID 6355> which encodes the amino 
acid sequence <SEQ ID 6356>. This protein is predicted to be 508 ribosomal protein L5 (rplE). Analysis of 
liiis protein sequence reveals the following: 

Possible site: 48 

»> Seems to have no N-terminal signal sequence 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB03865 GB:AP001507 ribosomal protein L5 {BL6) [Bacillus halodurans] 

Identities = 143/178 (80%) , Positives = 162/178 (90%) 

Query: 3 mLKEKYTNEVVPALTEKENYSSVMAVPKVEKIVimGVGriAVSHRK^ 62 

NRLKEKY E+VP+LTEKENYSSVMaVPK+EKIV+NMGVGDAV NAK L+KA EL 1+ 
Sbjct: 2 NRLKEKSQKEIVPSLTEKFOTSSV^IAVPKLEKIVVlMS^7GnAVQNRKRLDKATO 61 

Query: 63 GQKPLITKAKKSIAGFRLREGVAIGAKVTLRGERMYEFLDKLVSVSLPRVRDFHGVPTKS 122 

GQKP+ITKAKKSIAGF+LREG+ IGAKVTLRGERMYEFLDKL+SVSLPRVRDF G+ K+ 
Sbjct: 62 GQKPIITKAKKSIAGFKLREGMPIGAKVTLRGERMYEFLDKLISVSLPRVRDFRGISKKA 121 

Query: 123 PDGRGim'LGVKEQLIFPEINFDDVDKVRGriDIVIVTTANTDEESRELIiKiGLGMPFAK 180 

FDGRGNYTLGVKEQLIFPEI++D VDKVRG+D+VIVTTA+TDEE+RELL +GMPF K 
Sbjct: 122 FIXSRGim'IjGVKEQLIFPEIDYDKVDKVRGMDVVIVrTASTDEEARELLSQMGMPFQK 179 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6357> which encodes the aimno acid 
sequence <SEQ ID 6358>. Analysis of this protein sequence reveals the following: 

Possible site: 48 

»> Seems to have no N-terminal signal sequence 



Final Results 



bacterial cytoplasm Certainty=0 . 1845 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



Final Results 



bacterial cytoplasm Certainty=0 . 1793 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 177/180 (98%) , Positives = 180/180 (99%) 
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Query: 1 MflNRLKEKYTNEWPALTEKFISrySSVmVPKA/EKIVIjmGVGDAVSimKNLEK^^^ 60 

MftmLKEKYTlffiV+PALTEKBOT+SVMftVPKVEKIVIiimGVGDaVSN^^ 
Sbjct: 1 MftNRLKEKyTNEVIPM,TEKEWraSVMftVPKVEKIVIJ«MGV^^ 60 

5 

Query: 51 ISGQKPLITKRKKSIMFRLREGVaXGAKOTLRGERMYEPIBKLVSVSLPRVRDE^^ 120 

ISGQKPLITKAKKSIAGFRLREGVAIGAJCVTLRGERMYEFLDKLVSVSLPRVRDFHGVPT 
Sbjct: 61 ISGQKPLITKAKKSIAGFRLREGVAIGAKVTLRGERMYEFLDKLVSVSLPRVRDFHGVPT 120 

10 Query: 121 KSFIX3RGNYTI£VKEQLIFPEINFDDVDKVRGLDIVIVTTANTDEESRELLKGLGMPFAK 180 

KSFDGRGNYTLGVKEQLIFPEI+FDDVDKVRGLDIVIVTTANTDEESRELLRGLGMPFAK 
Sbjct: 121 KSFDGRGtOTLGVKEQLlFPEISFDDVDKVTilGLDIVIVTTAiraDEESREIiLKGLGMPFAK 180 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 2057 

A DNA sequence (GBSx2169) was identified in S.agalactiae <SEQ ID 6359> which encodes the amino 
acid sequence <SEQ ID 6360>. This protein is predicted to be 50S ribosomal protein L24 (rpIX). Analysis 
of this protein sequence reveals the following: 

20 Possible site: 26 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1850 (Affirmative) < suco 

25 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty^O. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD33285 GB:AF1260ei RpL24 [Streptococcus pneumoniae] 
30 Identities = 89/101 (88%) , Positives = 94/101 (92%) 



35 



Query: 1 MFVKKGDKURVIAGKDKGTEAVVLKALPKVNKVVVEGVALIKKHQKPNNENPQGAIVEKE 60 

MFVKKGDKVRVIAGKDKGTEAWL ALPKVNKV+VEGV ++KKHQ+P NE PQG I+EKE 
Sbjct: 1 MFVKKBDKVRVIAGKDKBTEAVVLTALPKVNKVIVEGVNIVKKHQRPirraLPQGGIIEKE 60 

Query: 61 APIHVSNVQVLDKNGVAGRVGYKVVDGKKVRYNKKSGEVLD 101 

A IHVSNVQVLDKNGVAGRVGYK VDGKKVRYNKKSGEVIiD 
Sbjct: 61 AAIHVSNVQVLDKNGVRGRVGYKFVDGKKVRVNKKSGEVLD 101 

40 A related DNA sequence was identified in S.pyogenes <SEQ ID 6361> which encodes the amino acid 
sequence <SEQ ID 6362>. Analysis of this protein sequence reveals the following: 

Possible site: 26 , 

»> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0 . 1850 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0, 0000 (Not Clear) <■ suco 

50 An alignment of the GAS and GBS proteins is shown below. 

Identities = 95/101 (94%) , Positives = 99/101 (97%) 

Query: 1 MFVKKGDKVRVIAGKDKGTEAVVLKALPK^KVVVEGVALIKOTQKPNNENPQQAIVEKE 60 
MFVKKGDKVRVIAGKDKGTEAWLKALPKVNKV+VEGV +IKKHQKPN ENPQGAIVEKE 
55 Sbjct: 1 MFVKKGDKVRVIAGKDKGTEAVVLKALPKVNKVIVEGVGMIKKHQKENTENPQGAIVEKE 60 

Query: 61 APIHVSNVQVLDKNGVAGRVGYKOTDGKKVRYNKKSGEVLD 101 
APIHVSNVQVLDKNGVAGR+GYKVVDGKKVRY+KKSGEVLD 
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Sbjct: 51 APIHVS]WQVI£)KNGVA.GRlGYK\AroGKKTOYSKKSGEVLD 101 

Based on this analysis, it was predicted that this protein and its epitopes, could be useflil antigens for 
vaccines or diagnostics. 

5 Example 2058 

A DNA sequence (GBSx2170) was identified in S.agalactiae <SEQ ID 6363> which encodes the amino 
acid sequence <SEQ ID 6364>. This protein is predicted to be SOS ribosomal protein L14 (iplN). Analysis 
of this protein sequence reveals the following: 

Possible site: 16 
10 >» Seems to have no M-terrainal signal secjuence 

Final Results 

bacterial cytoplasm — Certainty=0. 1004 (Affirmative) < suco 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
15 bacterial outside — Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



20 



>GP:ftftD33284 GB:AF126061 RpL14 [Streptococcus pneumoniae] , 
Identities = 116/122 (95%) , Positives = 120/122 (98%) 

Query: 1 MIQQETRLKVADNSGAREILTIKVLGGSGRKFANIGDVIVASVKQATPGGAVKKGDWKA 60 

MIQ ETRLKVADNSGAREILTIKVLGGSGRKFANIGDVIVASVKQATPGGAVKKGDWKA 
Sbjct: 1 MIQTETRLKVADNSGAREILTIKVLGGSGRKFAKIGDVIVASVKQATPGGAVKKGDWKA 60 

25 Query: 61 VIVRTKTGaiUiPDGSYIKPDimAVIIRDDKTPRGTRIFGPVAKELREGGYMKIVSLAPE 120 

VIVRTK+GARR DGSYIKFD+N&AVIIR+DKTPRGTRIFGPVRRELREGG+MKIVSLaPE 
Sbjct: 61 VIVRTKSGARRADGSYIKFDENAAVIIREDKTPRGTRIPGPVARELREGGFMKIVSLAPE 120 

Query: 121 VL 122 
30 VL 

Sbjct: 121 VL 122 

A related DNA sequence was identified in S. pyogenes <SEQ ID 6365> which encodes the amino acid 
sequence <SEQ ID 6366>. Analysis of this protein sequence reveals the following: 

35 Possible site: 16 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 1004 (Affirmative) < suco 
40 bacterial membrane — Certainty= 0.0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



45 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 122/122 (100%) , Positives = 122/122 (100%) 

Query: 1 MIQQETRLKVADNSGAREILTIKVLGGSGRKFANIGDVIVASVKQATPGGAVKKGDVVKA 60 

MIQQETRLKVADNSGAREILTIKVLGGSGRKFANIGDVIVASVKQATPGGAVKKGDWKA 
Sbjct: 1 MIQQETRLKVADNSGAREILTIKVLGGSGRKFANIGDVIVASVKQATPGGAVKKGDWKA 60 

50 Query: 61 VIVRTKTGARRPDGSYIKFDDNAAVIIRDDKTPRGTRIFGPVARELREGGYMKIVSLAPE 120 

VIVRTKTGRRRPDGSYIKTOimAVIIRDDICrPRGTRIFGPVRRELREGGYMKIVSLAPE 
Sbjct: 61 VIVRTRTGARRPDGSYIKFDDNiiAVIIRDDKTPRGTRIFGPVaRELREGGYMKIVSLAPE 120 

(Juery: 121 VL 122 
55 VL 

Sbjct: 121 VL 122 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2059 

A DNA sequence (GBSx2171) was identified in S.agalactiae <SEQ ID 6367> which encodes the amino 
5 acid sequence <SEQ ID 6368>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 3415 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:AAD33283 GB:AF126061 RpSlV [Streptococcus pneumoniae] 

Identities = 82/86 (95%) , Positives = 83/86 (96%) 

Query: 1 MERHQRKTLYGRVVSDKMDKTITVVVETKEmPVYGKRINYSKKyKAHDENNVAKEGDIV 60 
MERN RK L GRVVSDKMDKTITVVVETKKNHPVYGKRINYSKICyKAHDENWAKEGDIV 
20 Sbjct: 1 MERlWTOKATLVGRVVSDKtClKTlTVVVETKRNHPVYGKRINYSKKyKAHDENNVAKEGDIV 60 

Query: 61 RIMETRPLSATKRFRLVKWEKAVII 86 

RIMETRPLSATKRERLVEWE+AVI I 
Sbjct: 61 RIMETRPLSATKRPRLVEWEEAVII 86 

25 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6369> which encodes the amino acid 
sequence <SEQ ID 6370>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

>» Seems to have no H-terminal signal sequence 

30 

Final Results 

bacterial cytoplasm Certaintya=0. 3415 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

35 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 86/86 (100%) , Positives = 86/86 (100%) 

Query: 1 MERNQRKTLYGRVVSDKMDKTITVVVETK35NHPVYGKRINYSKKYKAHDENNVAKEGDIV 60 
40 ^ffiRNQRKTLYGRVVSDKMDKTITVVVETKRNHPVYGKRINYSKKyPCaHDENNVAI<EGDIV 

Sbjct: 1 MERNQRKTLYGRVVSDKmiO'ITVVVETKRNHPVyGKRINYSKKyKRro 60 

Query: 61 RIMETRPLSATKRPRLVEWEKAVII 86 

RIMETRPLSATKRFRLVEWEKAVIl 
45 Sbjct: 61 RIMETRPLSATKRFRLVEWEKAVIl 86 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2060 

50 A DNA sequence (GBSx2172) was identified in S.agalactiae <SEQ ID 6371> which encodes the amino 
acid sequence <SEQ ID 6372>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 . 4329 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certaintyi=0. 0000 (Not Clear) •£ suco 

5 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD33282 GB:AF126061 RpL29 [Streptococcus pneumoniae] 
Identities = 58/68 (85%) , Positives = 64/68 (93%) 

10 Query: 1 MKLQEIKDFVKELRGLSQEELAKKEIffiLKKELEDLRFQAaAGQLEKl'ARLDEW^ 60 

MKL E+K+FVKELRGLSQEELAK+ENELKICELF+LRFQAA GQLE+TARL EVKKQIAR+ 
Sbjct: 1 MKIOTVKEFVKELRGLSQEEIAKRENELKKELFELRFQRATGQLEQTJUy^KEVKKQIJ^ 60 

Query: 61 KTVQSEMK 68 
15 KTVQSE K 

Sbjct: 61 KTVQSEAK 68 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



20 Example 2061 

A DNA sequence (GBSx2174) was identified in S.agalactiae <SEQ ID 6373> which encodes the amino 
acid sequence <SEQ ID 6374>. This protein is predicted to be RpL16 (rplP). Analysis of this protein 
sequence reveals the following: 

Possible site: 52 
25 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4574 (Affirmative) < suco 

bacterial membrane — Certainty=0.0000 (Not Clear) < suco 
30 bacterial outside Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GEMPEPT database. 

>GP:A?iD33263 GB:RF126059 RpL16 [Streptococcus pneumoniae] 
Identities = 135/137 (98%) , Positives = 137/137 (99%) 

35 



Query: 


1 


MLVPKRVKHRREFRGKMRGEAKtMKEVSFGEYGLCJATTSHWITimQIEaaRIAmRWlKR 


60 






MLVPKRVKHRREFRGK^ro6EaKGGKEV+FGEYGLQATTSHWITl!roQIEAftRIAt^ 




Sbjct: 


1 


MLVPKRVKHRREFRGKMRGEARGGKEVRFGEYGIKJATTSHWITNRQIEAftRIAMTRyM^ 


60 


Query: 


61 


GGKVWIKIFPHKSYTAKAIGVRMGSGKGAPEGWAPVKRGKVMFEIAGVSEEVAREALRL 


120 






GGK\miKIFPHKSYTAKAIGVRMGSGKGAPEGWVAPVKRGKVMFEIAGVSEE+AREaLRIj 




Sb j ct : 


61 


GGKOTIKIFPHKSYTAKAIGVRMGSGKGSiPEGWVRPVKRGKVMPEIAGVSEEIAREaLRL 


120 


Query: 


121 ASHKLPVKCKFVKREAE 137 








ASHKLPVKCKFVKREAE 




Sb j ct : 


121 


ASHKLPVKCKFVKREAE 137 





A related DNA sequence was identified in S.pyogenes <SEQ ID 6375> which encodes the amino acid 
sequence <SEQ ID 6376>. Analysis of this protein sequence reveals the following: 

50 Possible site: 52 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4574 (Affirmative) < suco 

55 bacterial meinbrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0.0000(Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 136/137 (99.%) , Positives = 137/137 (99%) 





1 


MLVPKRVKHRREFRGKMRGEAKGGKEVSFGEYGLQATTSHWITNRQIEAARIAMTRYMKR 


60 






MLVPKRVKHRREFRGKMRGEAKGGKEVSFGEYGLQATTSHWITHRQIEjyytlAMTRYMKR 




Sbj ct : 


1 


MLVPKRVKHRREFRGK^M3E2\KGGKEVSFGEYGLQATTSHWIT[niQIEAMlIMlTRYMK^ 


60 


Query: 


61 


GGKVWIKIFPHKSYTAKAIGVRMGSGKGAPEGWVaPVKRGKVMFEIAGVSEEVaREALRL 


120 






GGKVWIKIFPHKSYXaKAlGVRMGSGKGAPEGWV7\PVKRGKVMFEiaGVSEE+A^^ 




Sbj ct : 


61 


GGKVWIKlFPHKSYTAKAIGVRMGSGKGRPEGWVAPVKRGKVMFEIAGVSEEIiiREAI^ 


120 


Query: 


121 


ASHKLPVKCKFVKREAE 137 








ASHKLPVKCKFVKREAE 




Sbj ct : 


121 


ASHKLPVKCKFVKREAE 137 





Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2062 

A DNA sequence (GBSx2175) was identified in S.agalactiae <SEQ ID 6377> which encodes the amino 
acid sequence <SEQ ID 6378>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

»> Seems to have no N-tertninal signal sequence 

Final Results 

bacterial cytpplastn — Certaintys=o. 3758 (Affirmative) < suco 
bacterial membrane — Certainty=0.p000(Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the followiiig sequences in the GENPEPT database. 

>6P:AAD33280 GB:AF126061 RpS3 [Streptococcus pneumoniae] 
Identities = 200/208 (96%) , Positives = 203/208 (97%) 



Query: 


10 


MRVGIIRDWDAKWYAEKEYADYLHEDLAIRKFINKELADASVSTIEIERAVNKVIVSLHT 


69 






MRVGIIRDWDAKWYAEKEYADyLHEDIiAIRKF+ KELADA+VSTIEIERAVNKV VSLHT 




Sbjct: 


1 


MRVGIIRDWDAKWYAEKEYADYLHEDIiAIRKFVQKELADAAVSTIEIERAVNKVNVSLHT 


60 


Query: 


70 


AKPG^WIGKGGANVIffiLRGQIWKLTGKQVHINIIEIKQPDLnftHLVGENIARQLEQRVAF 


129 






AKPGMVIGRGGftNVDALR +LNKLTGRQVHINIIEIRQPDIinAHLVGE lARQLEQRVRF 




Sbjct: 


61 


AKPGMVIGKGGaiSrajALRAKiaiKLTGKQVHINIIEIKQPDLnaHLVGEGIARQLEQRV^ 


120 


Query: 


130 


RRAQKQAIQRTMRAGAKGIKTQVSGRIiNGADlARAEGYSEGTVPLHTLRADIDYAWEEAD 


189 






RRAQKQAIQR MRAGAKGIKTQVSGRIiNGADIARAEGYSEGTVPIiHTLRADIDYAWEEAD 




Sbjct: 


121 


RRAQRQAIQRAMRAGAKGIKTQVSGRLNGRDIARAEGYSEGTVPLHTLRADIDYAWEEM 


180 


Query: 


190 


TTYGKLGVKVWIYRGEVLPARKNTKGGK 217 








TTYGKLGVKVWIYRGEVLPARKNTKGGK 




Sbj ct : 


181 


TTYGKLGVKVWIYRGEVLPARKNTKGGK 208 





A related DNA sequence was identified in S.pyogenes <SEQ ID 6379> which encodes the amino acid 
sequence <SEQ ID 6380>. Analysis of &is protein sequence reveals the following: 

Possible site: 53 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3758 (Affirmative) < suco 

bacterial membrane Certaintyi=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty= 0.0000 (Not Clear) < suco 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 2063 

A DNA sequence (GBSx2176) was identified in S.agalactiae <SEQ ID 6381> which encodes the amino 
acid sequence <SEQ ID 6382>. This protein is predicted to be 50S ribosomal protein L22 (rplV). Analysis 
of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm — Certainty=0. 2704 (Affirmative) < suco 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0.0000{Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AaD33279 GB:AP126061 Rpli22 [Streptococcus pneumoniae] 
Identities = 99/114 (86%) , Positives = 106/114 (92%) 

Query: 1 MAEITSAKAMARTVRVSPRKTRLVLDLIRGKNVADAIAILKFTPNKAARVIEKTLNSAIA 60 

MAEITSAKAMARTVRVSPRK+RLVLD IRGK+VADAIAIL FTPNKAA +1 K LNSA+A 
Sbjct: 1 MAEITSAKAMARTVRVSPRKSRLVLDNIRGKSVaDAIAILTFTPNKAAEIILKV]^ 60 

Query: 61 NAENNFGtEKSNLWSETEANEGPTMKRFRPRAK^ 114 

NAENNFGL+KANLWSE F7UJEGPTMKRFRPRAKGSASPINKRT H+TV V+EK 
Sbjct: 61 NAENNFGI^KAtmWSEAFAliEGPTMKRFRPRAKGSASPINKRTAHITVAVAEK 114 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6383> which encodes the amino acid 
sequence <SEQ ID 6384>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2794 (Affirmative) < suco 

bacterial membrane — Certainty= 0.0000 (Not Clear) < suco 

bacterial outside CertaintyaO . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 113/114 (99%) , Positives = 113/114 (99%) 

Query: 1 MAEITSAKAMARTVRVSPRKTRLVLDLIRGKNVADAIAILKFTPNKAARVIEKTUISAIA 60 

MftEITSAKftMaRTVKVSPRRTRLVLDLIRGK VRDAIAILKFTPNKaARVIEKTMTSAIA 
Sbjct: 1 MAEITSAKBMARTVRVSPRKTRLVUDLIRGKKVADAIAILKPTENKAARVIEKTLNSAIA 60 

Query: 61 NAENNFGLEKANLWSETFANEGPTMKRFRPRAKGSASPINKRTTHVTVWSEK 114 

NAENNFGLEKANLWSETFANEGPTMKRFRPRAKGSASPINKRTTHVTVWSEK 
Sbjct: 61 NRENNFGLEKANLVVSETPANEGPTMKRFRPRAKBSASPINKRTTHVTVVVSEK 114 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2064 

A DNA sequence (GBSx2177) was identified in S.agalactiae <SEQ ID 6385> which encodes the amino 
acid sequence <SEQ ID 6386>. This protein is predicted to be 30S ribosomal protein S19 (ipsS). Analysis 
of this protein sequence reveals the following: 
Possible site: 23 

»> Seems to have no N-terminal signal sequence 



Final Results 
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bacterial cytoplasm 

. bacterial membrane 
bacterial outside 



Certainty=0. 2991 (Affirmative) < suco 
Certainty=0 . 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 



The protein is similar to ribosomal protein S19 from S.pneumoniae. 

A related DNA sequence was identified in S.pyogenes <SEQ E) 6387> which encodes the amino acid 
sequesQce <SEQ ID 6388>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

>» Seems to have no N-terminal signal sequence 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 92/92 (100%) , Positives = 92/92 (100%) 

Query: 1 MGRSLKKGPFVDEHLMKKVEAQMTOEKKKVIKTWSRRSTIFPSFIGYTIAVYDGRKHVPV 60 

MGRSLIOCGPFVDEHljMKKVEAQZaroEKKKVlKTWSRRSTIFPSFIGYTIAVYIXSRKHVPV 
Sbjct: 19 MGRSLKKGPFVDEHLMKKVEAQftlTOEKKKVIKTWSRRSTIFPSFIGYTIAVYDGRKH^ 78 

Query: 61 yiQEDMVGHKLGEFAPTRTYKGHRADDKKTRR 92 

YIQEDMVGHKLGEFAPTRTYKGHAADDKKTRR 
Sbjct: 79 YIQEDMVGHKLGEFAPTRTYRGHAADDKKTRR 110 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2065 

A DNA sequence (GBSx2178) was identified in S.agalactiae <SEQ ID 6389> which encodes the amino 
acid sequence <SEQ ID 6390>. This protein is predicted to be L2 (rplB). Analysis of this protein sequence 
reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC45959 GB:U43929 L2 [Bacillus subtilis] 
Identities = 208/277 (75%) , Positives = 239/277 (86%) 

Query: 1 MGIKWKPTTNGRRIMTSLDFAEITTOT'PEKSLLVSLKNKAGRNIBJGRITVRHQGGGHKR 60 

M IK YKP++NGRR MT+ DFAEITT+ PEKSLL L K GRNN G+tn/RHQCGGHKR 
Sbjct: 1 MAIKKYKPSSNGRRGMTTSDFAEITTDKPEKSIiLAPLHKKGGRHNQGKLTVRHQGGGHKR 60 

Query: 61 HYRLIDFKRNKDGVEAVVKTIEYDENRTANIALVHYTDGVKAYIIAPRGLEVGQRIISGP 120 

YR+IDFKR+KD6+ V T+EYDPNR+ANIAL++Y DG K YILaPKB++VG ++SGP 
Sbjct: 61 QyRVIDFKRDKDGIPGRVAT\raroPNRSANIALI]mI)GEKRYIIAPKGIQVGTEV^^ 120 

Query: 121 EADIKVGNRLPLANIPVGTVIHNIELQPGKGAELIRjyVGASAQVLGQEGKYVLVRLQSGE 180 

EADIKVGNALPL NIPVGTV+HNIEIi+PGKG +L+R+AG SAQVLG+EGKYVLVRL SGE 
Sbjct: 121 EBDIKVGNaLPLINIPVGTVVHNIELKPGKGGQLVRSAGTSAQVLGKEGKYVLVRLNSGE 180 



Final Results 

bacterial cytqplasm Certainty=0. 3 3 19 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial cytqplasm Certaintyi=0. 3182 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



Query: 181 VRMILGTCRRTIGTVGNEQQSLVNIGKAGRNRWKSVRPTVRGSVMNPHDHPHGGGEGKaP 240 
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VRMIL CRA+IG VGNEQ Ii+NIGKAGR+RWKG+RPTVRGSVMNPNDHPHGGGEG+AP 
Sbjot: 181 VRMILSACRASIGQVGNEQHELINIGKAGRSRWIOSIRPTVRGSVMNPNDHPHGGGEGRAP 240 

Query: 241 VGRKAPSTPWGKPMKSLKTRNKKAKSDKLIVRRRNQK 277 
5 +GRK+P +PWGKP LG RTR KK KSDK IVRRR K 

Sbjct: 241 IGRKSPMSPWGKPTLGFKTRKKKNKSDKFIVRRRKNK 277 

A related DNA sequence was identified in S.pyogenes <SEQ ID 639 1> which encodes the amino acid 
sequence <SEQ ID 6392>. Analysis of this protein sequence reveals the following: 

10 Possible site: 41 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2560 (Affirmative) < suco 

15 bacterial membrane Certaintyi=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 264/277 (95%) , Positives = 276/277 (99%) 

20 



Query: 


1 


MGIKVYKPTTNGRRNMTSLDFAEITTNTPEKSLLVSLKNKAGRNNNGRITVRHQGGGHKR 


60 






+GIKVYKPTTNGRRimTSUDFAEITT+TPEKSLLVSLK+KAGRNNNGRITVRHQGGGHKR 




Sbjct: 


1 


VGIKVYKPTTNGRRNMTSLDFAEITTSTPEKSLLVSLKSKAGRNNNGRITVRHQGGGHKR 


60 


Query: 


61 


HYRLIDFKRNKDGVEAWKTIEYDPNRTMIAi:,VHYTDGVKa.yiriAPKGLEVGQRIISGP 


120 






HYRLIDFKRNKIX3VERVVIOTIEYDPNRTANIALVHYTDGVKAYI4^APKGLETO 




Sbjct: 


61 


HYRLIDFKRNKDGVEAVVKTIEYDPNRTANIMiVHYTDGVKAYIIAPKGLEVGQRIVSGP 


120 


Query: 


121 


EADIKVGNALPI^IPVGTVIHNIELQPGKGaELIRAaGASAQVLGQEGKYVLVRLQSGE 


180 






+MlIKVGN2\LPIi2a5IPVGTV+HNIEL+PGKG EL+RAAGJ^Sl^VLGQEGKYVLWLQSGE 




Sbjct: 


121 


DftDIKVGNALPI^IPVGTVVHNIELKPGRGGELVRaAGASAQVLGQEGKyVLV^ 


180 


Query: 


181 


VRMIIX3TCRATIGTVGNEQQSLVNIGKAGRNRWKGVRPTVRGSVMNENDHPHGGGEGKM 


240 






VRMILGTCRATIGTVGNEQQSLVNIGKAGR+RWKG+RPTVRGSVMNPNDHPHGGGEGKAP 




Sb j Ct : 


181 


VRMILGTCRATIGTVGNEQQSLVNIGKAGRSRWKGIRPTVRGSVMNPNDHPHGGGEGKAP 


240 


Query : 


241 


VGRKAPSTPWGKPALGLKTRNKKAKSDKLIVRRRNQK 277 








VGRKAPSTPWGKPALGLKTRNKKAKSDKLIVRRRN+K 




Sbjct: 


241 


VGRKAPSTPW6KPALGLKTRNKKAKSDKLIVRRRNEK 277 





40 , 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

Example 2066 

A DNA sequence (GBSx2180) was identified in S.agalactiae <SEQ ID 6393> which encodes the amino 
45 acid sequence <SEQ ID 6394>. This protein is predicted to be 508 ribosomal protein L23 (rplW). Analysis 
of this protein sequence reveals the following: 
Possible site: 44 

»> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty^O . 1669 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BaB03855 GB:AP001507 ribosomal protein L23 [Bacillus halodurans] 
Identities = 56/92 (60%) , Positives = 67/92 (71%) , Gaps = 1/92 (1%) 
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Query: 2 NLYDVIKKPVITEKSWVALEaGKyTFEVDTRAHKLLIKQAVEAAFDGVKVASVNlVTVKP 61 

N DVIK+PVITE+S + KYTFEVD RA+K IK A+E FD VKVA WT+ K 
Sbjct: 3 NARDVIKRPVITERSTEVMGDKKTrFEVDVRANKTQIKDAIEEIFD-VKVAKmr^ 61 

Query: 62 KAKRVGRYTGFTSKTKKAIITLTADSKAIELP 93 

K KR GRYTGFT++ KKAI+TLT DSK ++ F 
Sbjct: 62 KPKRFGRYTGFTARRKKAIVTLTPDSKELDFF 93 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6395> which encodes the amino acid 
sequence <SEQ ID 6396>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

»> Seems to have no N- terminal signal sequence 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 96/98 (97%) , Positives = 97/98 (98%) 

Query: 1 MNLYDVIKKPVITEKSMTOLEAGKYTFEVDTRAHKLLIKQAVEAAFDGVKVASVI^^ 60 

M^&YDVIKKPVITEKSM^-ALEAGKy^PEVr)TRAHKLLIKQAVEAAFDGVKVASV^^ VK 
Sbjct: 1 MNLYDVIKKPVITEKSMIALEAGKyiFEVDTRAHKLLIKQAVEAAFDGVKWASVN^ 60 

Query: 61 PKftKRVGRYTGFTSKTKKAIITLTADSKAIELFAAEAE 98 

PKAKRVGRYTGFTSKTKKAI ITLTADSKA.IELFAAEAE 
Sbjct: 61 PKAKRVGRYTGFTSKTKKAIITLTADSKAIELFAAEAE 98 

Based on this analysis, it was predicted that this protein and its epitopes, could be useiiil antigens for 
vaccines or diagnostics. 

Example 2067 

A DNA sequence (GBSx2181) was identified in S.agalactiae <SEQ ID 6397> which encodes the amino 
acid sequence <SEQ ID 6398>. This protein is predicted to be SOS ribosomal protein L4 (rplD). Analysis of 
this protein sequence reveals the following: 

Possible site: 60 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.54 Transmembrane 140 - 156 ( 139 - 156) 



Final Results 



bacterial cytoplasm Certainty=0 . 1617 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 1617 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certaintys=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

. >GP:AAC45957 GB:U43929 L4 [Bacillus subtilis] 

Identities = 130/207 (62%) , Positives = 160/207 (76%) 



Query: 1 MANVKLFDQTGKEVSSVELNEAIFGIEPNESVVFDVVISQRASLRQGTHAVKNRSAVSGG 60 

M V L++Q G +ELN ++FGIEPNESVVFD ++ QRASLRQGTH VKNRS V GG 
Sbjct: 1 MPiaffiLTOC3NC3STAGDIELNASVFGIEPNESVVFimilMQRASLRQGTHKVKNRS 60 



Query: 61 GRKPWRQKGTGRARQGSIRSPQWRGGGWFGPTPRSYGYKLPQKVRRLALKSVYSAKVAE 120 

GRKPWRQKGTGRARQGSIRSPQWRGGGWFGPTPRSY YKLP+KVRRLA+KSV S+KV + 
Sbjct: 61 GRKPraiQKGTGRARQGSIRSPQWRGGGVVFGPTPRSYSYKLPKKVRRLAIKSVLSSKVID 120 



Query: 121 DKWAVENLSFAAPKTAEFASVLSALSIDSKVLVILEEGNEFAALSAKNLP1WTVATATT 180 
+ + +E+L+ KT E A++L LS++ K L++ + NE ALSARN+P VTV A 
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Sbjct: 121 MrIIVLEDLTLDTAKTKEMAAILKGLSVEKKMlIOTADflNEAValJSaRNIPGVTV^^^ 180 

Query: 181 ASVLDIVNADKLLVTJCEAISTIEGVLA 207 
+VLD+VN +KLL+TK A+ +E VIA 
5 Sbjct: 181 IOTLDVVOTEKLLITIOUWEKVEEVIjR 207 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6399> which encodes the amino acid 
sequence <SEQ ID 6400>. Analysis of this protein sequence reveals the following: 

Possible site: 60 
10 >» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2544 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 199/207 (96%) , Positives = 203/207 (97%) 

20 CJuery: 1 MflNVKLFDQTGKEVSSVELNEAIFGIEPNESWFDWISQRASLRQGTHAVKNRSAVSGG 60 

MANVKLFDQTGKEVSSVELN+AIFGIEPNESVVFDWISQRASLRQGTHAVKNRSAVSGG 
Sbjct: 1 MANVKLFDQTGKEVSSVEIJSIDAIFGIBPNESWFDWISQRASLRQGTHAVKNRSAVSGG 60 

Query: 61 GIUOTOQKBTORftRQGSIRSPQWRGGGVVFGPTPRSYeYKLPQKVRRLaLiCSVYSaKVM 120 
25 GRKPWRQRGTGRARQGSIRSPQWRGGGVVFGPTPRSYGYKLPQKVRRLALKSVYSA^^ 

Sbjct: 61 GRKPWRQKGTGRaRQGSIRSPQTOGGGVVFGPTPRSYGYKLPQKVRRLALKSVYSAKVaE 120 

Query: 121 DKFVAVENLSFAAPKTAEFASVLSALSIDSKVLVILEEGNEFAALSARNLPNVTVATATT 180 
DKFVAVE LSFAAPKTAEFA VLSALSID+KVLV++EEGNEFAALSARNLPNVTVATA T 
30 Sbjct: 121 DKFVAVEGLSFAAPKTAEFAKVLSALSIDTKVLVLVEEGNEFAALSARNLPNVTVATAAT 180 

Query: 181 ASVLDIVNBDKLLVTKEAISTIESVLA 207 

ASVLDIVNADKLLVTKEAISTIE VIA 
Sbjct: 181 ASVLDIVNaDKLLVTKEAISTIEEVLA 207 

35 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2068 

A DNA sequence (GBSx2183) was identified in S.agalactiae <SEQ ID 640 1> which encodes the amino 
40 acid sequence <SEQ ID 6402>. This protein is predicted to be 50S ribosomal protein L3 (rplC). Analysis of 
this protein sequence reveals the following: 

Possible site: 40 

>>> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 2090 (Affirmative) < suco 

bacterial membrane CertaintjfeO. 0000 (Not Clear) < suco 

bacterial outside — Certaintyi=0. 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC45956 GB:U43929 LB [Bacillus subtilis] 
Identities = 157/208 (75%) , Positives = 180/208 (86%) , Gaps = 2/208 (0%) 

Query: 1 MTKGILGKKVGMTQIFTESGEFIPVTVIEATPNVVLQVKTVETDGYEAVQVGFDDKREVL 60 
55 MTKGILG+K+GMTQ+F E+G+ IPVTVIEA PNWLQ ET E DGYEA+Q+GFDDKRE L 

Sbjct: 1 MTKGILGRKIGMTQVFAENGDLIPVTVIEaAPNVVLQKKTAENDGYEAIQLGPDDKREKL 60 



Query: 61 SNKPAKGHVAKaNTAPKRFIREFKNIE--GLEVGaELSVEQFEAGDVVDVTGTSKGKGFQ 118 
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SNKP KGHVAKA TAPKRF++E + +E EVG E+ VE F AG++VDVTG SKGKGFQ 
Sbjct: 61 SNKPEKGHVAKftETAPKRFVKEIRGVEMimYEVGQEVKVEIFSAGEIVDVTGVSKGKG 120 

Query: 119 GVIKRHGQSRGPMaHGSRYHRRPGSMGPVAENRVFKNKRLAGRMGGNRVTO 178 
5 G IKRHGQSRGPM+HGSRYHRRPGSMGPV PNRVFK K L GRMGG ++TVC3ISILEIV+V 

Sbjct: 121 GRIKRHGQSRGPMSHGSRraRRPGSMGPVDPiniWKGKLLPGRMGGEQIWQl^EIVKVD 180 

Query: 179 PEKNWLIKGNVPGAKKSLITIKSAVKA 206 
E+N++IjIKGNVPGAKKSLIT+KSAVK+ 
10 Sbjct: 181 AERMLLLIKGNVPGAKKSLITVKSAVKS 208 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6403> which encodes the amino acid 
sequence <SEQ ID 6404>. Analysis of this protein sequence reveals the following: 

Possible site: 40 
15 »> Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm — Certainty=0. 2090 (Affirmative) < suco 

bacterial membrane — Certainty^O. 0000 (Not Clear) < suco 

20 bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 205/208 (98%), Positives = 207/208 (98%) 

25 Query: 1 MTKGIIiGKKTOMTQIFTESGEFIPVTVIEATENVVI^2VKTVETIX3YEAVQVGFDDK^^ 60 

MTKGIL6KKVGm:QIFTESGEFIPVTVIEATENVVI<3VKTVETI)GyEAVQVGFDDKREVL 
Sbjct: 1 MTKBir<3KK\ramQIFTESGEFIPVTVIEATElWVLQVKTVETIX3VEAVQVGFDDKRETO 60 

Query: 61 SNKPAKGHVAKANTAPKRFIREFKNIEGLEVGAELSVEQFEAGDWDVTGTSKGKGFQGV 120 
30 SNKPAKGHVAKANTAPKRFIREFKNIEGLEVGRELSVEQFEAGDWDVTG SKGKGFQGV 

Sbjct: 61 SNKPAKGHVAKZUilTAPKRFIREFKNIEGLEVGaELSVEQFEAGDVVD^ 120 

Query: 121 IKRHGQSRGPMRHGSRYHRRPGSMGPVAENRVFKNKRIJVGRMGGNRVTVQlSmEIV^ 180 

IKRHGQSRGPMRHGSRYHRRPGSMGPVAPNRVFKNKRLAGRMGGNRVTVQNLEIVQVIPE 
35 Sbjct: 121 IKRHGQSRGPMAHGSRYHRRPGSMGPVAPNRVFKNKRLAGRMGGNRVTVQNLEIVQVIPE 180 

Query: 181 KNWLIKGMVPGAKKSLITIKSAVKAAK 208 

KNV+L+KGNVPGAKKSLITIKSAVKAAK 
Sbjct: 181 KNVILVRGNVPGAKKSLITIKSAVKAAK 208 

40 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2069 

A DNA sequence (GBSx2184) was identified in S.agalactiae <SEQ ID 6405> which encodes the amino 
45 acid sequence <SEQ ID 6406>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>>> Seems to have an vmcleavable N-term signal seq 

INTEGRAL Likelihood = -0.43 Transmembrane 5 - 21 ( 5 - 21) 

50 Final Results 

bacterial meiribrane — Certainty=0. 1171 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

55 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2070 

A DNA sequence (GBSx2185) was identified in S.agalactiae <SEQ ID 6407> which encodes the amino 
5 acid sequence <SEQ ID 6408>. This protein is predicted to be SOS ribosomal protein SIO (rpsJ). Analysis 
of this protein sequence reveals the following: 

Possible site: 37 

>» Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 3160 (Affirmative) < suco 

bacterial membrane — Certainty=0.0000{Not Clear) < suco 

bacterial outside — Certainty= 0.0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:ARB46363 GB:L29637 SIO ribosoital protein [Streptococcus mutans] 
Identities = 98/102 (96%), Positives = 102/102 (99%) 

Query: 1 MANKKIRIRLKAYEHRTLDTAAEKIVETATRTGATVAGPVPLPTERSLYTIIRATHKYKD 60 
20 MANKKIRIRLKAYEHRTLDTAAEKIVETATRTGA+VAGPVPLPTERSLYT+IRATHKYICD 

Sbjct: 1 MftNKKlRIRIiKAYEHRTLDTAAEKIVETATRTGASVAGPVPLPTERSLYTVIRATHKYKD 60 

Query: 61 SREQFEMRTHKRLVDIIOTTQKTVDftLMKLDLPSGVNVEIKL 102 
SREQFEMRraKRL+DI+NPTQKTVnALMKLDLPSGVNVEIKL 
25 Sbjct: 61 SREQFEMRTHKRLIDIVNPTQKTVDftLMKLDLPSGVNVEIKL 102 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6409> which encodes the amino acid 
sequence <SEQ ID 641 0>. Analysis of this protein sequence reveals the following: 

Possible site: 37 
30 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certaintyi=0 .3160 (Affirmative) < suco 

bacterial membrane Certaintyi=0 . 0000 (Not Clear) < suco 

35 bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 102/102 (100%) , Positives = 102/102 (100%) 

40 Query: 1 MANKKIRIRLKAYEHRTLDTAREKIVETATRTGATVAGPVPLPTERSLYTIIRATHKYKD 60 

MANKKIRIRIiKAYEHRTLDTAftEKIVETATRTGATVAGPVPLPTERSLYTI IRATHKYKD 
Sbjct: 1 MANKKIRIRLKAYEHRTLDTAftEKIVETATRTGATVAGPVPLPTERSLYTIIRATHKm) 60 

Query: 61 SREQFEMRTHKRLVDIINPTQKTVDALMKLDLPSCSVNVEIKL 102 
45 SREQFEMRTHKRLVDIINPTQKTVDALMKLDLPSGVNVEIKL 

Sbjct: 61 SREQFEMRTHKRLVDIINPTQKTVDALMKLDLPSGVNVEIKL 102 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

50 Example 2071 

A DNA sequence (GBSx2186) was identified in S.agalactiae <SEQ ID 6411> which encodes the amino 
acid sequence <SEQ ID 6412>. Analysis of this protein sequence reveals the following: 

Possible site: 34 
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»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2538 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty^O. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be iisefiil antigens for 
vaccines or diagnostics. 

Example 2072 

A DNA sequence (GBSx2187) was identified in S.agalactiae <SEQ ID 6413> which encodes the amino 
acid sequence <SEQ ID 6414>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

>» Seems to have no N-terminal signal sequence 



INTEGRAL 


iiikelihood 




•11. 


,41 


Transmembrane 


88 


- 104 


( 


79 


- 110) 


INTEGRAL 


Likelihood 




-8, 


,39 


Transmembrane 


304 


- 320 


( 


300 


- 324) 


INTEGRAL 
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-S. 


.58 


Transmembrane 


185 
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( 


180 
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-5, 


.63 


Transmembrane 


338 


- 354 


( 


331 


- 357) 


INTEGRAL 
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-5. 


.52 


Transmembrane 


240 


- 256 


( 


237 


- 259) 


INTEGRAL 
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-4, 


.99 
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383 
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( 


375 
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-3, 


.82 
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49 
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( 


48 
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Likelihood 
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.87 
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( 
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- 144) 
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-2, 


.81 


Transmembrane 


159 


- 175 


( 


159 


- 177) 


INTEGRAL 


Likelihood 




-2 


.18 


Transmembrane 


30 


- 46 


( 


30 


- 47) 



Final Results 

bacterial membrane Certainty=0 . 5564 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0.0000(Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06655 GB:AP001517 unknown conserved protein [Bacillus halodurans] 
Identities = 132/423 (31%) , Positives = 210/423 (49%) , Gaps = 16/423 (3%) 

Query: 7 IIQLAIPAMIENILQMLMGVVIWLVAQIflVVAVSGVSVaNNIITIYC3aiF--IALGASI 64 

+ L P IE +L MLMG D +++Q AV+ V V+N 1+ + +F +A G SI 
Sbjct: 11 LFALTWPIFIEILLHMLMGNftDTLMLSQYSDDAVAAVGVSNQILAVIIVMFGFVATGTSI 70 

Query: 65 ASLLAKSLAGSKKDDAISVCSQAIFLTLLIGAVLGIISIVFGQTFFKLLGTTKSVAQVGG 124 

L+A+ L ++++A V +1 L+ G VLG++ I FG K + S+ Q 
Sbjct: 71 --LVAQHLGAKERENAGKVAVVSIGANLIFGIVLGLLLIAFGPPILKAMQLDDSLLQEAT 128 

Query: 125 LYLAIVGGGWTLGMLTTLGSFLRVQGQPRLPMYVSIFVNFLNAVLSGFAIFEWR Y 180 

LYL IVGG V ++ T G+ LR, + MYV+I +N LN + + IF 

Sbjct: 129 LYLQIVGGFSVVQSLIMTAGAILRSHSPTKDVMYVTIGMNILNVIGNYLFIFGPFGIPVL 188 

Query: 181 GLVGVAVSTLIARLIGICILAKYL PIKKIIKRMTWKISAQIWNLALPSAGER 232 

G+ GVA+ST+++R IG+ ++A L P ++KR + + +PSAGE+ 

Sbjct: 189 GVTGVALSTWSRTIGLFVIAILLYKRIRGELPFAYLLKRFPRVELRNLLKIGIPSAGEQ 248 

Query: 233 LMMRAGDWIV&IWQLGTNWAGNAIGETLTQENYMPGIfilATATIILTAKYVGQKNRE 292 

L A +VI + +GT + +LF+++I TIL VGK + 

Sbjct: 249 LSYNASQLVITYFIAMMGTEALTTKVYTQNLMMFVFLFAVAIGQGTQILIGHQVGAKQIQ 308 

Query: 293 SIEETIQSSYYIGLVLMILISSFMLLAGKPLTQLFTNNPSAIKGSLIVILLSFVGVPATI 352 

+ S +1 + + + ++ PL +FT+NP + ++LL+ + P 

Sbjct: 309 AAYVRCFRSLWIAMTVSVSMAWFFAFSTPLLGIFTDNPDILSLGTTLLLLTIILEPGRA 368 
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Query: 353 GTLVYTAAWQGLGNMIiPFYTTTIGMWLIRVVLGYLI^IVFEIXSLLGVWm 412 

LV +++ G+KPY +MWIV + yiiLG+ LGL+GVW+A IAD PR 
Sbjct: 369 COTiWISSLRAAGDVKI'PVYLaiVSMWGIAVPIAYIiGLPIKSLGLIGVWIAFIADEWPRG 428 

Query: 413 LFL 415 
L + 

Sbjct: 429 LLM 431 

A related DNA sequence was identified in S.pyogenes <SEQ ID 641 5> which encodes the amino acid 
sequence <SEQ ID 641 6>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
»> Seems to have no N- terminal signal sequence 



INTEGRAL 


Likelihood 




-5, 


.26 


Transmembrane 


89 - 


105 


( 


85 


- 108) 


INTEGRAL 


Likelihood 




-4, 


.35 


Transmembrane 


305 - 


321 


{ 


302 


- 322) 


INTEGRAL 


Likelihood 




-3 


.82 


Transmembrane 


161 - 


177 


( 


161 


- 180) 


INTEGRAL 


Likelihood 




-3 


.82 


Transmembrane 


192 - 


208 


( 


189 


- 208) 


INTEGRAL 


Likelihood 




-3 


.77 


Transmembrane 


129 - 


145 


( 


128 


- 151) 


INTEGRAL 


Likelihood 




-3 


.24 


Transmetabrane 


242 - 


258 


{ 


240 


- 258) 


INTEGRAL 


Likelihood 




-2 


.81 


Transmembrane 


378 - 


394 


( 


377 


- 394) 


INTEGRAL 


Likelihood 




-2 


.66 


Transmembrane 


339 - 


355 


( 


338 


- 358) 


INTEGRAL 


Likelihood 




-2, 


.60 


Transmembrane 


58 - 


74 


( 


58 


- 75) 


INTEGRAL 


Likelihood 




-2 


.50 


Transmembrane 


32 - 


48 


( 


32 


- 49) 



Final Results 

bacterial membrane Certainty=0 . 3102 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:BAB06655 GB:AP001517 unknown conserved protein [Bacillus halodurans] 
Identities = 119/435 (27%) , Positives = 214/435 (48%) , Gaps = 14/435 (3%) 



Query: 


9 


IFSLALPSMIBNILQMLMGMVDNyLVAQIGLVAVSGVSIANNIISIYQSLFIALGAAVSS 


68 






+F+L P IB +L MLMG D +++Q AV+ V ++N I+++ +F + S 




Sbjct: 


11 


LFALTWPIFIEIIlLH^^LMGNRryTIl[&SQYSDDA^^y^VGVSNQILRVIIVMFGF^^^ 


70 


Query: 


69 


LIARSIGEmQNKQUlIYMAGVLQVTLLLSVGLGLLSVAGHHQVLEWLGAEASVTLVGGQY 


128 






L+A+ +G + + L+ + LGLL +A +L+ + + S+ Y 




Sbjct: 


71 


LVRQHLGAKERENAGKVAWSIGBNLIFGIVLGLLLIAFGPPILKAMQLDDSLLQEATLY 


130 


Query: 


129 


LSIVGGMIVSLGLLTSLGAIVRAQGYPKIPMQVSLLINVLNAIFSALSIY VWGFGL 


184 






L IVGG V L+ + GAI+R+ + K M V++ +N+LN I + L 1+ + G+ 




Sb j ct : 


131 


LQIVGGFSWQSLIMTAGAILRSHSFTKDVMYVTIGMNILNVIGNYLFIFGPFGIPVLGV 


190 


Query: 


185 


LGVAWATVLSRLVGVFLLCQF IPIKQVAKKLMRPLDKIIFDLSLPAAGERLM 


236 






GVA +TV+SR +G+F++ +P + KR R + + + +P+AGE+L 




Sb j ct : 


191 


TGVALSTWSRTIGLFVIAILLYKRIRGELPFAYLLKRFPRVELRNLLKIGIPSAGEQLS 


250 


Query: 


237 


MBRAGDVLIIGIVVRFGTTALAGNAIGETLTQFNYMPGLftMATATIILVRRQLGGGKVTEI 


296 






A ++I + GT AL + L F ++ +A+ T 1L+ Q+G ++ 




Sb j ct : 


251 


YlSIASQLVITYFIAmGTEALTTKVYTQNLmFVFLFAVAIGQGTQILIGHQVGAKQIQAA 


310 


Query: 


297 


RYllREAFILSTLMMLVMGALTYLLGPSLLPLFTQNTDAQRSAMIVLLFSLLGAPATAGT 


356 






+ ++ + + M + + LL +FT N D +LL +++ P A 




Sbjct: 


311 


YVRCFRSLWIAMTVSVSMAWFFAPSTPLLGIFTnNPDILSLGTTLLLLTIILEPGRACN 


370 


Query: 


357 


LVYTAWQGIfiKAIOiPFYATTIGMIWIRIGLGYVIGVVWQYGLIGVWMATVIJDN^^ 


416 






LV + + 6 KPY +MWI + + Y++G+ GLIGVW+A + D R + 




Sb j ct : 


371 


LWISSLRAAGDVKFPVYLAIVSMWGIAVPIAYLLGLPLGLGLIGVWIAFIADEWFRGLL 


430 


Query: 


417 


LSKHFK--KYQEITF 429 








+ ++ K+QE++F 




Sbjct; 


431 


MIWRWRKGKWQEMSF 445 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 219/418 (52%) , Positives = 316/418 (75%) 

Query: 5 KEIIQLaiPAMIENILQMLMGVTONYLVAQIX3VVAVSGVSVaNNIITIYQAIFIMK3^ 64 
5 ++I IA+P+MIENILQMLMG+VDSm:iVAQ+G+VAVSGVS+A]raiI+IYQ++F 

Sbjct: 7 RKIFSLALPSMIENILQMLMGMVDNYLVAQIGLVAVSGVSIAOTIISIYQSLFIAI^^ 66 

Query: 65 ASLLAKSLAGSKKDDAISVCSQAIFLTLLIGAVLGIISIVFGQTFFKLLGTTKSVAQVGG 124 
+SL+A+S+ + ++ ++ + + +TLIj+ LG++S+ + LG SV VGG 

10 Sbjct: 67 SSLIARSIGENNQNKQMIYMAGVLQVTLLLSVGIiGLLSVaGHHQVLEmG 126 

Query: 125 LYLAIVGGGVVTLG^ILTTI,6SFIlRVQGQPRLP^m?SIFVNFImVLSGFAIPEWRYGLVG 184 

YL+IVGG +V+LG+LT+LG+ +R QG P++PM VS+ +N IjIJA+ S +1+ W +GL+G 
Sbjct: 127 QYLSIVGGMIVSLGLLTSLGAIVRAQGYPKIPMQVSLLINVLNAIFSALSIYVWGFGLLG 186 

15 

Query: 185 VAVSTLIARLIGICIIAKYLPIKKIIKRMTWKISAQIWNLALPSAGERLMMRAGDVVIVA 244 

VA +T+++RL+G+ +L +++PIK++ KR+ + I++L+LP+AGERLMMRAGDV+I + 
Sbjct: 187 VAWATVLSRLVGVFLLCQFIPIKQVAKRLMRPLDKIIFDLSLPAAGERLMMRAGDVLIIG 246 

20 Query: 245 IWQI/STIWVaSNAIGETLTQFiraiPGLGIATATIILTAKXVGQKNRESIEETIQSSYYI 304 

IW+ GT +AGNAIGETLTQFNYMPGL +ATATIIL A+ +G I 1+ ++ + 

Sbjct: 247 IVTOFGTTAIJiGKRIGETLTQFISnmPGLaMATATIILVARQLGGGKVTEIR^ 306 

Query: 305 GLVLMILISSFMLLAGKPLTQLFTNNPSAIKGSLIVILLSFVGVPATIGTLVYTAAWQGL 364 
25 ++M+++ + L G L LFT N A + ++IV+L S +G PAT GTLVYTA WQGL 

Sbjct: 307 STIiM[iaVMGaLTYIiIX3PSIiLPLFTQim3AQRSAMIVLLFSi:iICaP^^ 366 

Query: 365 GKBUCLPFYTTTIGMmijIRVVLGYLLGIVFELGIiGV^ 422 
G AKLPFY TTIGMW+IR+ LGY++G+V++ GD+GVWMAT+ DN RW Ii H+ +Y 
30 Sbjct: 367 GKAKLPFYATTIGMWVIRIGLGyVIGVVWQYGLIGVWMATVLnOTSRWFILSKHFKKY 424 

Identities = 48/211 (22%) , Positives = 89/211 (41%) , Gaps = 29/211 (13%) 

Query: 213 MTWKISAQIWNIALPSAGERLMMRAGDWIVAIWQLGTNWAGNAIGETLTQFNYMPGL 272 
M + +I++LALPS E ++ +V +V Q+G V+G +1 + + 

35 Sbjct: 1 MIYMiniRKIFSMIiPSMIENILQMLMGMVDNYLVAQIGLVAVSGVSIAirailSIYQSLFI 60 

Query: 273 GIATATIILTAKYVGQKNRESIEETIQSSYYIGLVLMILISSFML L 318 

+ A L A+ +G+ N+ Q +Y G++ + L+ S L L 

Sbjct: 61 ALGAAVSSLIARSIGENNQNK QLNYMAGVLQVTLLLSVGLGbLSVAGHHQVLEWL 115 

40 

Query: 319 AGKPLTQLFTNNPSAIKGSLIVILLSFVGVPATIGTLVYTAAWQGLGNAKLPFYTTTIGM 378 

+ L +1 G +IV L G+ ++G +V + G K+P + + + 

Sbjct: 116 GAEASVTLVGGQYLSIVGGMIVSL GLLTSLGAIV RAQGYPKIPMQVSLL-I 165 

45 Query: 379 WLIRWLGYLLGIVFELGLLGVWMATIADNI 409 

++ + L V+ GLLGV AT+ + 
Sbjct: 166 NVmAIFSALSIYVWGFGLLGVAWATVLSRL 196 
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INTEGRAL Likelihood = -2.18 
PERIPHERAL Likelihood = 0.32 



Transmembrane 
11 



30 - 46 ( 30 - 47) 



modified ALOM score: 2.78 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 5564 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty= 0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF01629{313 - 1533 of 1878) 

EGAD|165726|tM0815 (20 - 436 of 464) conserved hypothetical protein {Thermotoga maritima} 
OMNI|TM0815 conserved hypothetical protein GP| 4981345 | gb |aAD3S897 . 1 |AE001748_13 |AE001748 
conserved hypothetical protein {Thermotoga maritima} PIR|H72331|h72331 conserved 
hypothetical protein - Thermotoga maritima (strain MSB8) 
%Match =13.9 

%Identity =29.4 %Similarity =53.7 

Matches = 120 Mismatches = 183 Conservative Sub.s = 99 

48 78 108 138 168 198 228 258 

YK*RRDTGFRCYFNLKRFVRCFFT*GG'XRSTKGRSNP*NGSTYLKYARHG*RVSRFETIIKIRLF*NI*SEKETP*KFSH 



M 



288 318 348 378 408 438 468 498 

HSLFNDPG**KGDTVRYSKEIIQLAIPAMIENILQMLMGVVDliyLVAQLGWAVSGVSVANNIITIYQAIFIALGASIAS 



RYSLFKNYLPKEEVPEIRKELIKLALPAMGENVLQMLFGMM)TAFLGHYSWKAMSGVGLSNQVFWWQVVLIAASMGATV 
20 30 40 50 60 70 80 



528 558 588 609 639 669 699 729 

LLAKSLAGSKKDDAISVCSQAIFLTLLIGftVL- - -GIIS IVFGQTFFKLLGTTKSVAQVGGLYLAIVGGGWTLGMLTTL 




100 110 120 130 140 150 



759 789 819 837 867 897 909 939 
GSFLRVQGQPRLPMYVSIFVNFLNAVLSGFAIF EWRYGLVGVAVSTLIARLIGICILA KYLPIKKIIKRM 




170 180 190 200 210 220 230 



969 999 1029 1059 1089 1119 1149 1179 

TWKISAQI™LALPSAGERLM^1RAGDWI^alWQLGTNWAGNAIGETLTQFlra^PGLGIATATIILTAK^ 




250 260 270 280 290 300 310 



1209 1239 1269 1299 1329 1359 1383 1413 

IEETIQSSYYIGLVLMILISSFMLLAGKPLTQLFTNNPSAIKGSLIVILLSFVGVPATIGTLVYT--AAWQGLGNAKLPF 



VLGVIRQGWILSLLFQVTVGIIIFLFPEPLIRIFTSDPQIIEISKLPV--KIIGLFQFFI1AIDSTMNGALRGTGNTLPPM 
330 340 350 360 370 380 390 



1443 1473 1503 1533 1563 1593 1623 1653 

YTTTIGMWLIRVVLGYLI<3IVFELGIJ7W»mTiaDNIFRWLFLKVHYTOYIQKM*PEMVRFFSKIIK*^ 



IITFISIWTARLPVAFVMVKYFQLGLLGAWIGMIADIIFRSTLKLLFFLSGKWEKRAVLTRERVKELG 
410 420 430 440 450 460 



Based on this analysis, it was predicted that these proteins and their epitopes could be loseful antigens for 
vaccines or diagnostics. 
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Example2073 

A DNA sequence (GBSx2188) was identified in S.agalactiae <SEQ ID 6417> which encodes the amino 
acid sequence <SEQ ID 641 8>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2200 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD05671 GB:AE001448 THREONINE SYNTHASE [Helicobacter pylori 
J99] 

15 Identities = 161/479 (33%) , Positives = 259/479 (53%) , Gaps = 17/479 (3%) 

KOTASQAILKGI^JDIXMLFTPITFPKVDIiDFTKLKDASYQErvaKLVLSi^ 73 
K+ +A+L A GGL+T F L++ SY E+ + V + + L 

KIDFIEAVLNPNAPKGGLYTLEHFET- -LEWQDCLGMSYSEIiVEHVFELLNLEIPKNLLA 70 



20 



25 



30 G+FDDAQ ++K + + L + ++LS ANS+N GR+ QIVY+I+ + +L K 



35 



40 



45 



Query: 


14 


Sb j ct : 


13 




74 


Sbjct: 


71 


Query: 


133 


Sbjct: 


128 


Query: 


193 


Sb j ct : 


188 


Query: 


253 


Sbjct: 


248 


Query: 


312 


Sbjct: 


308 


Query: 


371 


Sbjct: 


366 


Query: 


431 


Sbjct: 


420 



YCISQAYDTKFDTTEIAPIVKIGDRYHL-ELFHGPTIAFKDMALSILPYLLTTAAKKQGV 132 
+ + Y+ + API + +R + EL+HGP++AFKDMAL L L + A G 



+ K ++L +TSGDTG A + G A +P ++ YPK+G S +Q+LQM+TQ N V + 



1+ + I ++P+GNFGN L A+YA ++GL + K+ +N N+VL +F +T YD 



K T SP+MDIL SSN+ER +F L G E T +LM+ L YAL+ ++ +L E F 



1+ ++ ++QY+ DPHTA A K ++ +TAS KF 

riQEVYAEHQYLIDPHTAT AMASLKTHEKTLVSATASYEKF 419 



A+ K+ D AA+ L + + + DL + + H+ V+ + ++ ++ 



No corresponding DNA sequence was identified in S.pyogenes. 

50 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2074 

A DNA sequence (GBSx2189) was identified in S.agalactiae <SEQ ID 6419> which encodes the amino 
acid sequence <SEQ ID 6420>. Analysis of this protein sequence reveals the following: 

55 Possible site: 27 

»> Seems to have no N-terminal signal sequence 

Final Results 
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bacterial cytoplasm Certainty=0 . 3153 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty^O . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9279> which encodes amino acid sequence <SEQ ID 9280> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AaF46975 GB:AE002410 alcohol dehyxirogenase, propanol -preferring 
[Neisseria tneniiigitidis MC58] 
Identities = 202/282 (71%) , Positives = 228/282 (80%) , Gaps = 1/282 (0%) 



Query: 


1 


MGHEGIGIVEEIGEGVTSLRVGDRVSIAWFFEGCGHCEYCTTGRETLCRSVKNAGYSVDG 


60 






+GHEGIG+V+E+ +GV +L+VGDRVSIAW F+ CG CEYC TGRETLCRSV NAGY+ DG 




Sb j ct : 


60 


LGHEGIGLVKEVADGVKNLKVGDRVSIAVKiFQSaSSCEYCOTQRETLCRSVim 


119 


(Juery: 


61 


GMSEYAIVTADYAVKVPEGLDPAQASSITCAGVTTYKAIKEAGAAPGQWIAVYGftGGLGN 


120 






GM+ + IV+ADYAVKVPEGLDPAQASSITCAGVTTYKAIK +G PGQWIA+YGAGGLGN 




Sbjct: 


120 


GMATHCIVSADYAVKVPEGLDPAQASSITCAGVTTYKAIKVSGVRPGQWIAIYGRGGLGN 


179 


Query: 


121 


LAVQYAKKVFNAHWAVDINADKLQIiAKEVGADLTVNGKEIKDVAAYIQEKTGGCHGVW 


180 






L VQYAKKVF AHWA+DIN DKL AKE GADL VN + +D A IQEKTGG H W 




Sb j ct : 


180 


LGVQYAKKVFGAHVVAIDIMDDKI^AKETGADLVVNAAK-EimKVIQEKTGG 


238 


(Juery: 


181 


TAVSKmENQAIDSVRAGGTVVAVGLPSEYMELSIVKTVLDGIRVVGSLVGTRKDIiEEAF 


240 






TAVS AFN A++ VRAGG WA+GLP E M+LSI + VLDGI WGSLVGTRKDLEEAF 




Sb j ct : 


239 


TAVSAAAENSAVNCVRAGGRWAIGLPPESMDLSIPRLVLDGIEWGSLVGTRKDLEEAF 


298 


Query: 


241 


AFGAEGLWPWEKVPVDTAPQVFDEMERGLIQGRKVLDFTK 282 








FGAEGLWP V+ +D AP +F EM G I GR V+D K 




Sb j ct : 


299 


QFGREGLWPKVQLRALDEAPAIFQEMREGKITGRMVIDMKK 340 





A related DNA sequence was identified in S.pyogenes <SEQ ID 642 1> which encodes the amino acid 
sequence <SEQ ID 6422>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2356 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certaintyi=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
Identities = 263/280 (93%) , Positives = 273/280 (96%) 

Query: 1 MGHE6IGIVEEIGEGVTSLRVGDRVSIAWFFEGCGHCEYCTTGRETLCRSVKNA6YSVDG 60 

+GHEGIGIVEEIGEGVTSL+VGDRVSIAWFFEGa3HCEYCTTGRETnCRSVKNAGYSVDG 
Sbjct: 76 LGHEGIGIVEEIGEGVTSLKVGDRVSIAWFFEGCGHCEYCTTGRETLCRSVKNAGYSVDG 135 

Query: 61 , GMSEYAIVTADYAVKVPEGLDPAQASSITCAGVTTYKAIKEAGAAPGQWIAVYGAGGLGN 120 

GMSEYA+VTADYAVKVPEGLDPAQASSlTCAGVTTYKAIKEAGaAPGQWI ++GAGGLGN 
Sbjct: 136 (MSEYAVWrADYAVKVPEGLDPAQASSITCAGVTTYKAIKEa(3SVAPGQWIVIFGAG^ 195 

Query: 121 lAVQYAKKOTNAHVVAVDINADKLQLAKEVGRDLTVNGKEIKDVA^ 180 

LAVQYAKKVFNAHWAVDIN DKL+LRKEVGaD+ VNGKEI+DV YIQEKTGG HGWV 
Sbjct: 196 LAVQYAIOCVFNAHWAVDINNDKLELAKEVGADILVNGKEIEDVPGYIQEKTGGAHGVW 255 

Query: 181 TAVSKVAENQAIDSVRAGGTVVATCLPSEYMELSIVKTVLDGIRWGSLVGTRKDLEEAF 240 

TAVSKVAENQAIDSVRAGGTVVAVGLPSEYMELSIVKTVLDGI+WGSLVGTRKDLEEAF 
Sbjct: 256 TAVSKVAFNQAIDSVRAGGTVVAVGLPSEYMELSIVKTVLD6IKVVGSLVGTRKDLEEAF 315 



Query: 241 AFGAEGLWPWEKVPVDTAPQVFDEMERGLIQGRKVLDF 280 
AFGAEGLV PNATEKVPVDTAP+VFDEMERGLIQGRiCVLDF 
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Sbjct: 316 AFGAEGLVAPWEKVPVDTAPEVFDEMERGLIQGRKVLDF 355 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2075 

A DNA sequence (GBSx2190) was identified in S.agalactiae <SEQ ID 6423> which encodes the amino 
acid sequence <SEQ ID 6424>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

»> Seems to have a cleavable N-term signal seq. 



INTEGRAL 


Likelihood 




-9. 


.82 


Transmembrane 


83 


- 99 


( 


76 


- 108) 


INTEGRAL 


Likelihood 




-7, 


.27 


Transmembrane 


46 


- 62 


( 


43 


- 65) 


INTEGRAL 


Likelihood 




-7, 


.22 


Transmembrane 


187 


- 203 


( 


182 


- 209) 


INTEGRAL 


Likelihood 




-6, 


.00 


Transmembrane 


243 


- 259 


( 


229 


- 262) 


INTEGRAL 


Likelihood 




-4, 


.25 


Transmembreme 


404 


- 420 


( 


402 


- 422) 


INTEGRAL 


Likelihood 




-3. 


.98 


Transmembrane 


120 


- 136 


( 


119 


- 136) 


INTEGRAL 


Likelihood 




-3, 


.88 


Transmembrane 


308 


- 324 


( 


307 


- 324) 


INTEGRAL 


Likelihood 




-2, 


.13 


Transmembrane 


378 


- 394 


( 


376 


- 394) 


INTEGRAL 


Likelihood 




-1. 


.38 


Transmembrane 


152 


- 168 


( 


152 


- 168) 


INTEGRAL 


Likelihood 




-1, 


.17 


Transmembrane 


271 


- 287 


( 


271 


- 287) 



Final Results 

bacterial membrane — - Certaintyi=0. 4927 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial , cytoplasm — Certainty=0.0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9371> which encodes amino acid sequence <SEQ ID 9372> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC17857 GB:AF026147 Yojl [Bacillus sxibtilis] 
Identities = 183/432 (42%) , Positives = 266/432 (61%) , Gaps = 1/432 (0%) 



Query: 1 MKLFIPVLIYQFANFSATFIDSVMTGQYSQLHLAGVSTASNLWTPFFALLVGMISALVPV 60 

+ + IP+ I Q TF+D+VM+G+ S LAGV+ S+LWTP + L G++ A+ P+ 

Sbjot: 15 LHILIPIFITQAGLSLITFLDTVMSGKVSPADLAGVAIGSSLWTPVXTGLAGILMAVTPI 74 

Query: 61 VGQHLGRGNKEQIRTEFHQFLYLGLILSLILFLIMQFIAQPVLGSLGI^EVLAVGRGYL 120 

V Q LG K++I Q +Y+ +LS+ + +1 +LG L L+ V + + +L 

Sbjct: 75 VAQLLGAEKKQKIPFTVLQAVYVAALLSIAVLVIGYAAVDLILGRLNLDIHVHQIAKHFL 134 

Query: 121 NYMLIGIMPLVLFSICRSFFDALGLTRLSMYLMLLILPPNSFFNYMLIYGKFGMPRLGGA 180 

++ +GI PL ++++ RSF D+LG TR++M + L LP N NY+ I+GKFGMP LGG 
Sbjct: 135 GFLSLGIFPLFVYTVLRSFIDSLGKTRVTMMITLSSLPINFVLNYVFIFGKFGMPALGGV 194 

Query: 181 GAGLGTSLTYWAIFIVIIIVMSLHPQIRTYHIW-TLERIKAPLIIEDIRLGLPIGLQIFA 239 

GAGL ++LTYW 11+ ++ + Y 1+ T+ + +++GLPIG +F 

Sbjct: 195 GAGLASALTYWCICIISFFIIHKNAPFSEYGIFLTMYKFSWKACKNLLKIGLPIGFAVFF 254 

Query: 240 EVAIFAWGLFMAKFSSIIIAAHQAAMNFSSLMYAFPLSISTALAITISFEVGAERFQDA 299 

E +IFA V L M+ F ++ lA+HQAAMNF+SL+Y PLS+S AL I + PE GA RF+DA 
Sbjct: 255 ETSIFAAVTLLMSHFHTVTIASHQRAMNFASLLYMLPLSVSMALTIVVGFEAGAARFKDA 314 

Query: 300 NTYSRIGRLTAVGITSGTLLFLFLFRENVAAMXNSDPHPTOITAQFLTYSLFFQFADAYA 359 

+YS IG + A+G + T + LFRE +A MY SDP + +T FL Y+LFFQ +DA A 
Sbjct: 315 RSYSLIGIMMAIGPSLFTAACILLFREQIAGMYTSDPDVLRLTQHFLIYALFFQLSDAVA 374 

Query: 360 APVQGILRGYKDTTKPFMIGAGSYWLC2y:,PLAVILEKNSQLGPFAYWIGLITGIFVCGLF 419 

AP+QG LRGYKD SYW+ LP+ ++ + LG F YWIGLI G+ + 

Sbjct: 375 APIQGAIJIGYKDVNYTLAAAFVSYWVIGLPVGYMVGTFTSLGAFGYWIGLIAGLAAGAVG 434 



Query: 420 LNQRLQKIKKLY 431 
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L RL K++K y 
Sbjct: 435 LFFRIAKLQKRy 445 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefUl antigens for 
5 vaccines or diagnostics. 

Example 2076 

A DNA sequence (GBSx2191) was identified in S.agalactiae <SEQ ID 6425> which encodes the amino 
acid sequence <SEQ ID 6426>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
10 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.60 Transmembrane 23 - 39 ( 23 - 39) 

Final Results 

bacterial membrane Certainty=0 .2041(Aff irmative) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm ^ — Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

20 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2077 

A DNA sequence (GBSx2192) was identified in S.agalactiae <SEQ ID 6427> which encodes the amino 
acid sequence <SEQ ID 6428>. Analysis of this protein sequence reveals the following; 

25 Possible site: 52 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3829 (Affirmative) < suco 

30 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC06891 GB:AE000703 hypothetical protein [Aquifex aeolicus] 
35 Identities = 72/213 (33%) ,y Positives = 115/213 (53%), Gaps = 11/213 (5%) 

Query: 36 RPKILMHVCCAPCSTYTLEYLSQ---WRDVTIYFANSNIHPKDEYYRREYVTQKFVHDFN 92 

+ KIL+H+CCAP + Y L+ L + +++ YF + NIHP +Ey R T++ + 
Sbjct: 3 KSKILVHICCAPDAIYFLKKLREDYPESEIIGYFYDPNIHPYEEYRLRYLETERICKELG 62 

Query: 93 KNTGYSVQFLSAPYEPNEFFKIVHGLEEEPEGGDRCKVCYDFRLDKTAEKAVELGFDYJV3 152 

N + Y+ + + V G E+EPE G RC++C+D+RL+K+AE A EL6 D 

Sbjct: 63 IN LIEGEYDLENWLERVKGYEDEPERGKRCQICFDYRLEKSAEVAKELGCDALT 116 



40 



45 Query: 153 SALTISPHKIISQTINTIGIDVQKIYDTQYLPSDLKKISIKGYQRSVEMCKDYDIYRQCYCGC 212 

+ L +SP K+ + G + K ++L D +K G Q ++ K+ +IY+Q YCGC 
Sbjct: 117 TTLLMSPKKSIPQLKKAGEEATKRTGIEPLAPDYRKGGGTQEMFKLSKEREIYQQDYCGC 176 

Query: 213 IFGAKDQGINLLQIKKDAKAFVSDKDGKEEFPN 245 
50 I+G Q +1 D F+ + G +E N 

Sbjct: 177 1YGLFKQKNG--KIFWDI,VGPLGRRPGSKEERN 207 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 6429> which encodes the amino acid 
sequence <SEQ ID 643 0>. Analysis of this protein sequence reveals the following: 

Possible site: 55 
»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0. 3498 (Affirmative) < suoo 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

RGD motif: 254-256 

The protein has homology with the following sequences in the databases: 

>GP:AAC06891 GB:AE000703 hypothetical protein [Aquifex aeolicus] 
Identities = 65/182 (35%) , Positives = 106/182 (57%) , Gaps = 9/182 (4%) 

Query: 39 RPSILMHVCCAPCSTYTLE'XLTQF ADITVYEftNSNIHPKDEYHRRAYVTQQPVSEFN 95 

+ IL+H+CCRP + Y L+ L + ++I YP + NIHP +EY R T++ E 
Sbjct: 3 KSKILVHICCaPDAIYFLKKLREDYPESEIIGYFYDPNIHPYEEYRLRYLETERICKELG 62 

Query: 96 AKTGNTVQFLEADYVPNEYVRQVRGLEEEPEGGDRCRVCFDYRLDKTAQKAVELGFDYFA 155 

+ +E +Y ++ +V+G E+EPE G RC++CFDYRL+K+A+ A ELG D 
Sbjct: 63 INLIEGEYDLENWLERVKGYEDEPERGKRCQICFDYRLEKSAEVAKELGCDALT 116 

Query: 156 SALTISPHKNSQTINDVGIDVQKVYTTKYLPSDFKKNNGYRRSVEMCEEYDIYRQCYCGC 215 

+ L +SP K+ + G + K ++L D++K G + . ++ +E +IY+Q YCGC 
Sbjct: 117 TTLLMSPKKSIPQLKKAGEEATKRTGIEFLAPDYRKGGGTQEMFKIiSKEREIYQQDYC 176 

Query: 216 VY 217 
+Y 

Sbjct: 177 lY 178 



An aUgnment of the GAS and GBS proteins is shown below. 

Identities = 184/255 (72%) , Positives = 219/255 (85%) 



Query: 


1 


MIDVENILEKMKPNQKINYDWVMQQMVKQWQASDIRPKILMHVCCAPCSTYTLEYLSQWA 


60 






MID++ IL M ENQKINYD VMQQM K W+ +RP ILMHVCCAPCSTYTLEYL+Q+A 




Sb j ct : 


4 


MIDLQEILAMMNENQKINYDRVMQQMAKVWEKESTOPSIIJIHVCCAPCSTYTLEYLTQFA 


63 


Query: 


61 


DVTIYFANSNIHPKDEYYRREYVTQKFVHDFNKNTGYSVQFLSAPYEPNEFFKIVHGLEE 


120 






D+T+YFANSNIHPKDEY+RR YVTQ+FV +FN TG +VQFL A Y PNE+ + V GLEE 




Sbjct: 


64 


DITVYFANSNIHPKDEYHRRAYVTQQFVSEFNARTGNTVQFLEftDYVENEYVRQVRGLEE 


123 


Query: 


121 


EPEGGDRCKVCYDFRLDKTAEKAVELGFDYFGSALTISPHKNSQTINTIGIDVQKIYDTQ 


180 






EPEaGDRC+VC+D+RIiDK.TA+KAVKLGFDYF SALTISPHKNSQTIN +GIDVQK+Y T+ 




Sbjct: 


124 


EPEGGDRCRVCFDYRLDKTAQKAVELGFDYPASALTISPHKNSQTINDVGIDVQKVYTTK 


183 


Query: 


181 


YLPSDLKKNKGYQRSVEMCKDYDIYRQCYCGCIFGAKDQGINLLQIKKDAKAFVSDKDGK 


240 






YLPSD KKN GY+RSVEMC++YDIYRQCYCGC++ AK QGI+L+Q+KKDAKAF++DKD 




Sb j ct : 


184 


YLPSDFKKNNGYRRSVEMCEEYDIYRQCYCGCTOAAKMQGIDLVQVKKDAKAFMADKDLD 


243 


Query: 


241 


EEFPNIRFTFNGKSM 255 








+F +IRF++ G M 




Sb j ct : 


244 


NDFTHIRFSYRGDEM 258 





Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2078 

A DNA sequence (GBSx2193) was identified in S.agalactiae <SEQ ID 643 1> which encodes the amino 
acid sequence <SEQ ID 6432>. Analysis of this proteia sequence reveals the following: 
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Possible site: 53 

»> Seems to have no N-teanniiial signal sequence 



Final Results 

5 bacterial cytoplasm Certainty=0. 4216 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

10 >GP:CaB14809 GB:Z99118 excinuclease ABC (subunit C) [Bacillus subtilis] 

Identities = 189/333 (56%) , Positives = 244/333 (72%) 

Query: 1 MNELIKHKLELLPDSPGCYLHKDKNGTIIYVGKAKNLKNRVKSYFHGSHNTKTELLVSEI 60 
MN+ +K KL LLPD PGCYL KD+ T+IYVGKAK LKNRV+SYF GSH+ KT+ LV+EI 
15 Sbjct: 1 MNKQLKEKI^LPDQPGCYLMKDRQQWIYTOKAKVLKmVRSYFTGSHnACT 60 

Query: 61 EDFEYIVTTSNTEALLriEIKn:.IQENMPKYNIRLKI)DKSYPYIKITlSERYPRM 120 

EDFEYIVT+SN EAL+LE+NLI+++ PKYN+ LKDDK+YP+IK+T+ER+PRL++TR VKK 
Sbjct: 61 EDFEYIVTSSNLEALILEMNLIKKHDPKYNVMLKDDKTYPFIKLTHERHPRLIVTRNVKK 120 

20 

Query: 121 SDGTYFGPYPDSGAATEIKRLLDRLFPFKKCTNPANKVCFYYHLGQCNAHTVCQTNKAYW 180 

G YFGPYP+ AA E K+LLDRL+P +KC+ ++VC YYHLGQC A V ++ 
Sbjct: 121 DKGRYFGPYPOTQAARETKKliIiDRLYPLRKCSKLPDRVCLYYHLGQCLAPCVKDISEETN 180 

25 Query: 181 DSLREDVKQFlNGKimiVNGLTEKMKSAAMTMEFERaAEYRDLIEAlSLLRTRQRVIHQ 240 

L E + +FL G N++ L EKM AA +EFERA E RD I I RQ++ 
• Sbjct: 181 REL^mSITRFIJRGGY]SIEVKKELEEKIfflEAAENLEFERAKEIlRDQIAHIEST^^ 240 

Query: 241 DMKDRDVFGYFVDKGWMCVQVFFVRNGKLIQRDVNMFPYYNEPEEDFLTYIGQFYQDTKH 300 
30 D+ DRDVF Y DKGWMCVQVFF+R GKLI+RDV+MFP Y E +E+FLT+IGQFY H 

Sbjct: 241 DLVDRDVFAYAYDKGWMCVQVFFIRQGKLIERDVSMFPLYQEADEEFLTFIGQFYSKNNH 300 

Query: 301 FIiPKEVFIPQDIDAKSVETIVGCKIVKPQRGKR 333 
PLPKE+ +P ID +E ++ + +P++G + 
35 Sbjct: 301 FLPKEILVPDSIDQSMIEQLLETNVHQPKKGPK 333 

There is also homology to SEQ ID 2568. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

40 Example 2079 

A DNA sequence (GBSx2194) was identified in S.agalactiae <SEQ ID 6433> which encodes the amino 

acid sequence <SEQ ID 6434>. This protein is predicted to be maltose operon transcriptional repressor 

(rbsR). Analysis of this protein sequence reveals the following: 

Possible site: 52 
45 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3761 (Affirmative) < suco 

bacterial membrane Certainty4=0.0000 (Not Clear) < suco 

50 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9393> which encodes amino acid sequence <SEQ ID 9394> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

55 >GP:AAD02112 GB:AF039082 putative maltose operon transcriptional 

repressor [Lactococcus lactis] 
Identities = 64/166 (38%) , Positives == 105/166 (62%) , Gaps = 13/166 (7%) 
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Query: 1 MGKSAiDYLYKKGHKSIQFVTDDLNSEVSEERYLGYFKGARKLGLNQKPALLFDRGNPQV 60 

+G+ A+ L + H++I FVTD +EV EERY G+ A +LGL+ LLF N + 
Sbjct: 169 LGREAWLLAQIi^ffiQMISFVTDTKETEVPEERyQGFKDEAERLGLSHD--LLPMDSNPSL■226 

Query: 61 LEEFiNRVKEEETTALIVIGDTVSVRVMQFLSFYKLKVPDDISIMTFISINSLFSHLIHPYL 120 

E TAL+V+ D +S++V++ L L VP+D+S++T+MNS+F +IHPYL 

Sbjct: 227 RNE TALWMDDVLSLKWERLRSQGIiNVPEDVSLITYNNSIFGaMIHPYL 276 

Query: 121 STFDINVNNLGRTSVRRLIDIIKSPDKVFSETIIVPFTLEERESVR 166 

+TFDI++ LG +++++++D+ + + + +TII PF L RES + 
Sbjct: 277 TTFDIHIEQLGASAIKKILDIiRDNKENLPEKTII-PFELIVRESTK 321 

There is also homology to SEQ ID 5082. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2080 

A DNA sequence (GBSx2195) was identified in S.agalactiae <SEQ ID 6435> which encodes the amino 
acid sequence <SEQ ID 6436>. This protein is predicted to be 4-alpha-glucanotransferase (malQ). Analysis 
of this protein sequence reveals the following: 

Possible site: 30 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm 
bacterial membrane 
bacterial outside 

The protein has homology with the following sequences in the GENPEPT database. 

>6P:A%A26923 GB:J01796 amylomaltase [Streptococcus pneumoniae] 
Identities = 250/500 (50%) , Positives = 329/500 (65%) , Gaps = 4/500 (0%) 



Query: 


1 


MKKRASGVLMHITSLPGDLGIGTFGREAYAFVDFLVETDQKFWQILPLTTTSFGDSPYQS 


60 






MKKR SGVLMHI+SLPG GIG+FG+ AY FVDFLV T Q++WQILPL TS+GDSPYQS 




Sbjct: 


1 


MKKRQSGVLMHISSLPGAYGIGSFGQSAYDFVDFLVRTKQRYWQILPLGATSYGDSPYQS 


60 


Query: 


61 


FSAVAGNTHLIDFDLLTLEGFISKDDYQNISPGQDPEVVDYAGLFEKRRPVLEKAVKNFL 


120 






FSA AGNTH ID D+L +G + D + + FG D VDYA ++ RRP+LEKAVK F 




Sb j ct : 


61 


FSAFAGNTHFIDLDILVEQGLLEASDLEGVDFGSDASEVDYAKIYYARRPLLEKAVKRFF 


120 


Query: 


121 


QEERATRM^SDFLQE-EKW\m>FAEFMAIKEHFGNKALQEWDDKAIIRREEEALAGYRQK 


179 






E + F Q+ + W+ FAE+MAIKE+F N A EW D R+ AL YR++ 




Sbjct: 


121 


-EVGDVKDFEKFAQIOTQSWLELFAEYMAIKEYFDNLAVraMPDftnARARKASALESYREQ 


179 


Query: 


180 


LSEVIKYHEVTQYFFYKQWFELKEYANDKGIQIIGDMPIYVSADSVEVWTMPELFKLDRD 


239 






+ YH VTQYFF++QW +LK YAND I+I+GDMPIYV+ DS ++W P LFK D + 




Sbjct: 


180 


LADKLVYHRVTQYFFFQQWLKLKAYANDNHIEIVGDMPIYVAEDSSDMWANPHLFKTDVN 


239 


Query: 


240 


KQPLAIAGVPADDFSDDGQLWGIWIYNWDYHKESDFDWWIYRIQSGVKMYDYLRIDHFKG 


299 






+ lAG P D+FS GQLWGNPIY+W+ + + WWI R++ K+YD +RIDHF+G 




Sbjct: 


240 


GKATCIAGCPPDEFSVTGQLWGNPIYDWEAMDKDGYKWWIERLRESFKIYDIVRIDHFRG 


299 


Query: 


300 


FSDYWEIRGDYQTAlSroGSWQPAPGPELFATIKEKLGDLPIIAENLGYIDERAERLLAGTG 


359 






F YWEI TA G W PG +LFA +KE+LG+L IIAE+LG++ + L TG 




Sbjct: 


300 


FESYWEIPAGSDTAAPGEWVKGPGYKLFAAVKEELGELNIIAEDLGFMTDEVIELRERTG 


359 


Query: 


360 


FPGMKIMEFGFYDTTGNSIDIPHNYTENTIAYAGTHDNEVINGWFEN-LTVEQKAYAENY 


418 






FPGMKI++F F + SID PH N++ Y GTHDN + GW+ N + + Y Y 




Sbjct: 


360 


FPGMKILQFAF-NPEDESIDSPHIAPANSVMYTGTHDNNTVIXSWYRlffilDraT^ 


418 



Certainty=0 .2003 (Affirmative) < suco 

Certainty=0. 0000 (Not Clear) < suco 

Certainty=0. 0000 (Not Clear) < suco 
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Query: 419 MRRLP^^EPITETVLRTLYATVSQTTITCMQDLIlDKPMSR^lM«lPlm/GG^^WQWRTO 478 

R E + +LRT++++VS I MQDLL+ _+RMN P+T+GGNW WRM ++ L 
Sbjct: 419 TNRKEYETVVHft^^^RTVFSSVSFMAIATMQDLLEIJDEaAIaWFPSTLGC^ 478 

5 Query: 479 TEajRKAFLKEITTIXlSIRGNK 498 ' 

T + L ++TTiy R N+ 
Sbjct: 479 TPAVEEGLLDLTTIYRRINE 498 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6437> which encodes the amino acid 
10 sequence <SEQ ID 6438>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

»> Seems to have no N-teinrdnal signal sec[uence 

INTEGRM. Likelihood = -0.85 Transmembrane 435 - 451 ( 435 - 451) 

15 Final Results 

bacterial membrane Certainty=0 . 1341 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

20 An alignment of the GAS and GBS proteins is shown below. 

Identities = 313/495 (63%) , Positives = 387/495 (77%) 

Query: 1 MKKRASGVLMHITSLPGDLGIGTFGREAYAFVDFLVETDQKFWQILPLTTTSFGDSPYQS 60 
M KRASG+LMHI+SLPG GIGTFG+ A+ FVDFL ET Q +WQILPLTTTSFGDSPYQS 
25 Sbjct: 1 MSIKRASGIIiMHISSLPGKFGIGTFGKSAFEFVDFIiAETKQTYWQILPLTTTSFGDSPYQS 60 

Query: 61 FSAVAGNTHLIDFDLLTLEGFISKDDYQNISFGQDPEWDYAGLFEKRRPVLEKAVKNFL 120 

FSA+AGNTH IDF+LL + + D +I+F6 +PE VDYA LF+ RRP+LEKAV+ F+ 
Sbjct: 61 FSAIAtSNTHFIDFELLVDDELLEAADLCDITFGTNPEAVDYaQLFQVRRPLLEKAVRAFV 120 

30 

Query: 121 QEERATRMLSDFLQEEKWVTDFAEFMAIKEHFGNKALQEWDDKAIIRREEEALftGYRQKL 180 

E+ L F W+TDFAEFMA+KE+F NKALQ+WDD+ +I+R+E++L YR+ L 

Sbjct: 121 MQENVCKLEAFETASSWLTDFAEFMALKEYENNKALQDWDDEWIKRQEDSUmYRELL 180 

35 Query: 181 SEVIKYHEVTQYFFYKQWFELKEYANDKGIQIIGDMPIYVSADSVEVWTMPELFKLDRDK 240 

++ I YH+V QYFFY+QW LK YAN KGI+IIGDMPIYVSADSVEVWTMPELFK+D DK 
Sbjct: 181 AKKITYHKVCQYFFYQQWSftLKIYANHKGIEIIGDMPIYVSADSVEVWTMPELFKVDSDK 240 

, Queiy: 241 QPIAIAGVPADDFSDDGQLVOIPIYNWDYHKESDFDWWIYRIQSGVKMYDYLRIDHFKGF 300 
40 +PL lAGVPAD FS+DGQLVJGNP YNW H++S+F WWIYRIQ K+YD LRIDHFKGF 

Sbjct: 241 KPLFIAGVPADGFSEDGQLWGNPTYNWSftHEKSNFAWWIYRIQESFKLYDQLRIDHFKGF 300 

Query: 301 SDYWEIRGDYQTANDGSWQPAPGPELFATIKEKLGDLPIIAENLGYIDERAERLLAGTGF 360 
SD+WEI +TA +G W AP6 I.F+ ++E LG+LPIIAENLGYIDE+AE+LLA TGF 
45 Sbjct: 301 SDFWEIPAGDKTARHGHWRSAPGIALFSAVREALGELPIIAENLGYIDEKAEQLLASTGF 360 

Query: 361 PGMKI^ffiFGFYDTTGNSIDIPHNYTENTIAYAGTH^NEVINGWFENLTVEQKAYAENYMR 420 

PGMKI+EFG +D T SID+PH Y' N +AY 6THnNEV+NGW++NL+ EQ + NY+ 
Sbjct: 361 PGMKILEFGLFDITSQSIDLPHYYDRNCVAYTGTHDNEWNGWYDNLSEEQVHFVNNYLH 420 

50 

Query: 421 RLPNEPITETVLRTLYAWSQTTITCMQDLLDKPADSRMNMP^rIVGGNWQWRMRKEDLTE 480 

+ +E IT+ +LRT++A+V T I C+QDLLDK SRMNMPNT+GGNWQWRM +L + 
Sbjct: 421 KHADESITKflMLRTIFASVOJTAILCIQDLLDKDGKSRMimPNTIGGNWQWRMLDGELNQ 480 

55 Query: 481 MRKAFLKEITTIYNR 495 

+ K +L +T +Y R 
Sbjct: 481 DHKDYLIYLTDLYGR 495 



60 



Based on this analysis, it was predicted that this protein and its epitopes, could be iiseful antigens for 
vaccines or diagnostics. 
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Example 2081 

A DNA sequence (GBSx2196) was identified in S.agalactiae <SEQ ID 6439> which encodes the amino 
acid sequence <SEQ ID 6440>. This protein is predicted to be glycogen phosphorylase (malP). Analysis of 
this protein sequence reveals the following: 

Possible site: 40 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2678 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certaintyi=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC00218 SB:AF008220 glycogen phosphorylase [Bacillus subtilis] 
Identities = 297/776 (38%) , Positives = 452/776 (57%) , Gaps = 41/776 (5%) 



Query: 


13 


GKVLSELTNEEIYVELLNFVKEEAaA KSKNSSQRKVYYISAEFLIGKLLSNNL 


55 






GK + + Y L N V+E +A KS+++S ++ YY+S EFL+G+LL NL 




Sbjct: 


21 


GKSFKDSAKLDQYKTLGNMVREYISftDWIETlSIEKSRSNSGKQTYYLSIEPLLGQLLEQNL 


80 


Query: 


66 


INLGIYKDVKKELELVGKSIAEIEDVEPEPSLGNGGLGRIiASCFIDSISSLGINGEGVGL 


125 






+NLG+ V+ L+ +G ++ EI +E + LGNGGI1GRIA+CF+DS++SL + G G+G+ 




Sbjct: 


81 


MNLGVRDyVEaGLKEIGINLEEILQIENDaGLGNGGLGRLaACFLDSLASLNLPGHGMGI 


140 


Query: 


126 


NYHCGLFKQVFRNNQQEAEANYWIEN - NSWLVPT - D I SYDVPF RDFTLKSRL 


175 






Y GLF+Q + Q W++N N W V D + DVPF + L R 




Sb j ct : 


141 


RYKHGLFEQKIVDGHQVELPEQWLKNGNVWEVRNADQAVDVPFWGEVHMTEKSGRLHFRH 


200 


Query: 


176 


DR IDVLGYKKDTKNYIJSmFDIDGLDYNLIEKGITFDKTEIKKNLTIiFLYP 


225 






++ I ++GY+ T N L L++ + Y G + ++ FLYP 




Sbjct: 


201 


EQATIVTAVPYDIPI IGYETGTVNTLRLWNAE- - PYAHYHGGNILSYKRETEAVSEFLYP 


258 


Query: 


226 


DDSDKNGELLRIYQQYFMVSNAAQLLIDEAIERGSNLHDLAEYAYVQINDTHPSMVIPEL 


285 






DD+ G++LR+ QQYF+V + + +++ + +L L + + INDTHP++ +PEL 




Sbjct: 


259 


DHnmEGKILRLKQQYFLVCASLKSIVNNYRKTHKSLSGLHKKVSIHINDTHPALAVPEL 


318 


Query: 


286 


IRLLTEKHGFEFDEAVSVVRNMVGYTNHTILAEALEKWPLEYLNEVVPHLOTIIKKLDQM 


345 






+R+L ++ ++EA + + + YTNHT L+EALEKWP+ ++P + II+++++ 




Sbjct: 


319 


MRILLDEENMSWEEAVffllTVHTISYTOHTTLSEALEKWPIHLFKPLLPRMYMIIEEINER 


378 


Query: 


346 


IRE EQTNPEVQIIDEAGRVHMAHMDIHFSTSVNGVAMjHTEILKNSELKVFY 


397 






+ E I G V MAH+ I S SVNGVA +H++ILK E++ F+ 




Sb j ct : 


379 


FCRAVWEKYPGDWKRIENmiTAHGVVKMAHLAIVGSYSVNGVAKIHSDILKEREMRDFH 


438 


Query: 


398 


DIYPDKFISnSTKITSrGITFRRWLEFANQDLADYLKELIGDSYLTnATQLEKLLTYADSNEViro 


457 






++P++FKNKTNGI RRWL AN L+ + E I6D ++ L +L YA + 




Sbjct: 


439 


LLFPNRENNKOTGIAHRRWLLKftNPGLSAIITEAIGDEWVRQPESLIRLEPYATDPAFIE 


498 


Query: 


458 


KLAAIKFKNKLALKRYLKENKGIELDEYSIIDTQIKRFHEYKRQQMNALYVIHKYLEIKR 


517 






+ K K K L + G+ ++ SI D Q+KR H YKRQ +N L++++ Y +K 




Sbjct: 


499 


QFQNNKSKKKQELADLI FCTAGWVNPES I PDVQVKRLHAYKRQLLNVLHIMYLYNRLKE 


558 


Query: 


518 


GH-FPSRKLTVIFGGKRAPAYTIAQDIIHLILCLSELINNDPEVNKYLNVHLVENYNVTV 


576 






F T IFG KA+P+Y A+ II LI ++E +N DP V + + V +ENY V++ 




Sbjct: 


559 


DSGFSIYPQTFIFGAKRSPSYYYAKKIIKLIHSVAEKVNVDPAVKQLIKVVFLENYRVSM 


618 


Query: 


577 


AEKLIPATDISEQISLASKEASGTGNMKFMLNGALTLGTMDGANVEIAELAGKENIYTFG 


636 






AE++ PA+D+SEQIS ASKEASGTGNMKFM+NGALT+GT DGAN+EI E G + lYTFG 




Sb j ct : 


619 


AERIFPASDVSEQISTASKEASGTGNMKFMMNGALTIGTHDGANIEILERVGPDCIYTFG 


678 



Query: 637 KDSDTIINLYETSGYRSKDYYDKDKVIREAVDFIISDDIVSLGNAERLKRLHDELV-GKD 695 

+D +++ E GYRS++YY D+ IR+ D +1+ G A+ + + D L+ D 

Sbjct: 679 LKADEVLSYQENGGYRSREYYQHDRRIRQVADQLINGFFE--GEADEFESIFDSI1LPHND 736 



10 



15 
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Query: 696 WFMTLIDLKEYIAVKEQVLADYEDYESWNKKVIHNIAKAGFFSSDRTIEQYNQDIW 751 

+ L D Y +E++ ADY + W++ I NIA +G+FSSDRTI +Y +DIW 
Sbjct: 737 EYFVLKDFSSYADAQERIQADYRERRKWSEHSIWIAHSGYFSSDRTIREYAKDIW 792 ' ■ 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6441> which encodes the amino acid 
sequence <SEQ BD 6442>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

>» Seems to have no N-tertninal signal sequence 

INTEGRMi Likelihood = -2.71 Transmembrane 538 - 554 ( 538 - 554) 



Final Results 

bacterial membrane — Certainty=0 .2084 (Affirmative) < suoo 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty^O. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 629/754 (83%) , Positives = 696/754 (91%) , Gaps = 2/754 (0%) 

Query: 1 MTRNFTTYVGQQ-GKVLSELTNEEIYVELtNFVKEEAAAKSKNSSQRKVYYISAEFLIGK 59 
20 MTR FT YV + GK +NEEIY+ LLNFVKEEA+ K+KNS++RKVYYISAEFLIGK 

Sbjct: 1 MTR-FTEYVETKLGKSLTQASNEEIYLSLmFVKEEASHKAKNSAKRKVYYISAEFLIGK 59 

Query: 60 LLSNNLINIjGIYKDVKKELEIiVGKSIAEIEDVEPEPSLGNGGLGRLASCFIDSISSLGIN 119 
LLSNNLINLGIYKD+K+EL GKSIAE+EDVE EPSLGNGGLGRIiASCFIDSI+SLGIN 
25 Sbjct: 60 LLSNNLINIfilYKDIKEEIJWiGKSIMWEDVELEPSLGNGGLGRLaSCFIDSIASLGIN 119 

Query: 120 GEGVGLNYHCGLFKQVFRITOJQQEAEANYWIENNSWLVPTDISYDVPFRDFTLKSRLDRID 179 

GEGVGMJYHCGLFKQVF+H-N+QEAE N+WIE++SWLVPTDISYDVPF++FTLKSRLDRID 
Sbjct: 120 GEGVGUraiCGLFRQVFKHNEQEREHSIFWIEDDSWLVPTDISYWPFK^ 179 

30 

Query: 180 VLGYKKDTKNYIJSn^FDIDGLDYNLIEKGITFDKTEIKKNLTLFLYPDDSDKNGEL^^ 239 

VLGYK+DTKNYUSILFDI+G+DY LI+ GI+FDKT+I KNLTLFLYPDDSDKNGELLRIYQ 
Sbjct: 180 VLGYKRDTKiraMLFDIEGVDYGLIKIXSISFDKTQIAKNLTLFLYPDDSDKNGELLRIYQ 239 

35 Query: 240 QYFMVSNAAQLLIDEAIERGSNLHDLREYAYVQINDTHPSMVIPELIRLLTEKHGFEFDE 299 

QYFMVSNAAQL+IDEAIERGSNLHDLA+YAYVQINDTHPSMVIPELIRLLTEKHGF+FDE 
Sbjct: 240 QYFMVSNAAQLIIDEAIERGSNLHDLADYAYVQINDTHPSMVIPELIRLLTEKHGFDFDE 299 

Query: 300 AVSVVRNMVGYTNHTILREMEKWPLEYIJlffiVVPHLVTIIKKI^QMIREEQTO 359 
40 AV+W+NMVGYTNHTILAEALEKWP YLNEWPHLVTII+KIiD ++R E ++P VQIID 

Sbjct: 300 AVAVVKNMVGYTiraTILAEALEKmPTAYENEVVPHLOTIlEKI^^ 359 

Query: 360 EAGRVHMAHMDIHFSTSVNGVAALHTEIDKNSELKVFYDIYPDKENNKTNGITFRRWLEF 419 
E+GRVHMAHMDIHF+TSVNGVAALHTEILKNSELK FYD+YP+KFNNKTNGITFRRWLEF 
45 Sbjct: 360 ESGRVHMAHroiHFATSVNGVAW^EILKNSELKAFYDLyPEKEraKTNGITFRRWLEF 419 

Query: 420 ANQDLADYLKELIGDSYLTDATQLEKLLTYADSNEVHDKLAAIKI'KNKUiLKRYLK^^ 479 

ANQDLADY+KELIGD YLTDAT+LEKL+ +AD VH KLA IKF NKLALKRYIiK+NK 
Sbjct: 420 ANQDLADYIKELIGDEYLTDATKI^KLMAFADDKAVHAKLAEIKENNKL^ 479 

50 

Query: 480 lELDEYSIIDTQIKRFHEYKRQQMNALYVIHKYLEIKRGHFPSRKLTVIFGGKAAPAYTI 539 

IELDE+SIIDTQIKRFHEYKRQQMNALYVIHICyLEIK+G+ P RK+TVIFGGKAAPAY I 
Sbjct: 480 lELDEHSIIDTQIKRFHEYKRQQMNALYVIHKYLEIKKGNLPKRKITVIFGGKAAPAYII 539 

55 Query: 540 AQDIIHLILCLSELIH^roPEVNKyIJTOHLVENYNVT\«VEIg^IPATDISEQISI*S 599 

AQDIIHLILCLSELINNDPEV+ YIiNVKLVENYNVTVAE LIPATDISEQISLASKEASG 
Sbjct: 540 AQDIIHLILCLSELINNDPEVSPYIiNVHLVENYNVTVAEHLIPATDISEQISLASKEASG 599 

Query: 600 TGImKFMMJGMTLGT^mGAlWEIAEIiAGKENIYTFGKDSDTIINLYETSGYRSKDYYDK 659 
60 TGNMKFMIjNGALTLGTMDGANVEIAELAG ENIYTFGKDSDTIINLY T+ Y +KDYYD 

Sbjct: 600 TGNMKFMLNGALTLGTMDGANVEIAEIAGMENIYTFGKDSDTIINLYATASYVA^ 659 

Query: 660 DKVIREAVDFIISDDIVSLGNAERLKRLHDELVGKDWFMTLIDLKEYIAVKEQVLftDYED 719 
1+ AV+FIIS ++++ GN ERL RL+ EL+ KDWFMTLIDL+EYI VKE+H-LADYED 
65 Sbjct: 660 HPAIKAAVNFIISPELLAFGNEERLDRIiYKELISKDWFMTLIDLEEYIEVKEKMLADYED 719 
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Query: 720 YESWNKKVIHNIAKAGFFSSDRTIEQYNQDIWHS 753 

+ W KV+HNIAKAGFFSSDRTIEQYN+DIWHS 
Sbjct: 720 QDLWMTKWHNIAKAGFPSSDRTIEQYNEDIWHS 753 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2082 

A DNA sequence (GBSx2197) was identified in S.agalactiae <SEQ ID 6443> which encodes the amino 
a:cid sequence <SEQ ID 6444>. This protein is predicted to be glycerol-3-phosphatase transporter (glpT). 
Analysis of this protein sequence reveals the following: 

Possible site: 59 

»> Seems to have no N-terminal signal sequence 
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-2, 
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.28 


Transmembrane 


407 - 


423 
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406 
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Likelihood 
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.02 


Transmembrane 


165 - 
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( 


165 


- 182) 


INTEGRAL 


Likelihood 




-0, 


.64 


Transmembrane 


29 - 


45 


( 


29 


- 45) 



Pinal Results 

bacterial membrane -- 
bacterial outside -- 
bacterial cytoplasm -- 



Certaintyi=0. 5352 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>6P:AAC44575 GB:U28354 IS629 ORFB fused with sequences similar to E. 

coli GlpT and UhpT proteins, Swiss-Prot Accession Number 
P08194 and P09836; Method: conceptual translation 
supplied by author [Shig 
Identities = 174/321 (54%) , Positives = 241/321 (74%) , Gaps = 4/321 (1%) 



Query: 109 GVIPSVITSIWLFTIMYLINGWLQGMGYPPGARTLVYWYDNKERIKYATIWNLSHNFGGA 168 

GV P V + + + YL+NGW+QGMGYPPGA+TLV+WY+++ERI +AT+WNLSHN GGA 
Sbjct: 12 GVGP-VCSELHIAPSTYLLNGWIQGMGYPPGAKTLVFWYEHRERISWATLWNLSHNVGGA 70 

Query: 169 lAPILTGVGLALAGNDSLNQRRAAYWFPGWACLLAVLVYFLQEDTPESIGLPPIEEyHK 228 

+AP+L G G+ +L+ ARAA+ FPGV+ ++VL+YF+Q D P S+GLPPIEE+ 

Sbjct: 71 LAPVLIGFSFGFFGDSALDHARAAFIFPGVLCMAMSVLIYFIQVDRPVSVGLPPIEEWKG 130 

Query: 229 EQYTNVVDSSDILEEPEVLGMGEIIKKYILPNTKLMWASLYSIFVYILRYGIVSWTPKFL 288 

++ E+ L + +II+K+I+ N KL++ +Y FVYILRYGIVSW PKFL 

Sbjct: 131 NWSHPAKGR---EQGPRLSIPDIIRKHIIRNNKLIYCCIYGSFVYILRYGIVSWAPKFL 187 

Query: 289 ATSVQDGGKGITATAGMGGFSLFEIGGIIGMLTAGYLSAKVFKNSKPLTNVAFLWAILL 348 

+ S+ GGK + A MGG S+FEIGG+ GML AGYLS ++F+NSKPLTN FL + I+L 
Sbjct: 188 SDSLDVGGKDMGKLASMGGGSVFEIGGVAGMLLAGYLSVRLPRNSKPLTNTLFLALTIIL 247 

Query: 349 LAAYWFIPAGPQYMALDFIILLGLGASIYGPVMMVGLYAMELVPKAAAGAASGLTGTFSY 408 

L AYW++P+G +Y+ L++ IL+ LG ++YGPVM +GLY+MELVPK AAGAASGL+GTFSY 
Sbjct: 248 LIAYWYVPSGNEYLWLNYTILILLGLAVYGPVMFIGLYSMELVPKEAAGAASGLSGTFSY 307 



Query: 
Sb j ct : 



409 VGGATIATLAIGIIIDHFGWG 429 

+ G+ +ATL +G+++D+ GWG 
308 IPGSIVATLGMGLWDYLGWG 328 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 6445> which encodes the amino acid 
sequence <SEQ ID 6446>. Analysis of this protein sequence reveals the following: 

Possible site: 36 
>» Seems to have no N-terminal signal sequence 
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-6 


79 


Transmembrane 
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: 419 
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-6 


37 


Transmembrane 


91 
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( 90 
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-5 


36 


Transmembrane 
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181) 


INTEGRAL 


Likelihood 




-5 


20 


Transmembrane 


350 




366 


( 347 




371) 


INTEGRAL 


Likelihood 




-4 


41 


Transmembrane 


23 




39 


( 22 




41) 


INTEGRAL 


Likelihood 




-3 


77 


Transmembrane 


257 




273 


( 249 




273) 


INTEGRAL 


Likelihood 




-1 


33 


Transmembrane 


61 




77 


61 




77) 


INTEGRAL 


Likelihood 




-1 


28 


Transmembrane 


383 




399 


383 




399) 


INTEGRAL 


Likelihood 




-0 


90 


Transmembrane 


299 




315 
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315) 



Final Results 

bacterial membrane Certainty=0 . 5946 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF96050 GB:AE004355 glycerol -3 -phosphate transporter [Vibrio cholerae] 
Identities = 128/438 (29%) , Positives = 215/438 (48%) , Gaps = 17/438 (3%) 

Query: 1 LF^ffiED■^IreREP-EKFTQFLRRQKVVTFVAFF-6YVaiYLVRNNFKIlMSNTIMVQNGm^ 58 

LF + +R P +K R + F+ F GY YL R NF L + +++ G+ + 

Sbjct: 21 LFKPAAHTQRLPSDKVDSVYSRLRWQLFIGIFVGYAGYYLGRKNFSL-AMPYLIEQGFSR 79 

Query: 59 AQIAILLSCLTVSYGIiaKFYMGALGDRVSLRKLFSISLGRSALICILIGFF---NSSMW 115 

+ + L ++++YGL+KF MG + DR + R S L SAL+ GF S+ 
Sbjct: 80 GDLGVALGAVSIAYGLSKPLMGNVSDRSNPRYFLSAGLLLSALVMFCFGFMPWATGSITA 139 

Query: 116 LGILLVLCGWQGALAPASQAMIANYFPNKTRGGAIAGWNISQNMGSALLPLTIALLTSM 175 

+ ILL L G QG PA + +++ K RG ++ WN++ N+G L I + + 
Sbjct: 140 MFILLFLNGWFQGMGWPACGRTMVHWWSRKERGEIVSVWNVAHNVGGGL IGPIFLL 195 

Query: 176 GLWPANGNILLAFLIPGVLVFLFALCCWKLGGDNPESEGLDSLRTMYGDAGESAVASEE 235 

GL + N + AF +P L A+ W + D P+S GL + D + S E 

Sbjct: 196 GLWM-FNDDWRTAFYVPAFFAVLVAVFTWLVMRDTPQSCGLPPIEEYKNDYPDDYDKSHE 254 

Query: 236 EKHNLSYWQLIWKYVFCaSPSLLLVAAVNVALYFVRFGIEDWMPIYLSQVANMSEAHIHFA 295 

+ ++ ++ +KYVF N L +A N +Y +R+G+ DW P+YL + + + +A 
Sbjct: 255 NE--MTAKEIFFKYVFNNKLLWSIAIANaFVYLIRYGVIJ)WAPVYLKEAKHFTVDKSSWA 312 

Query: 296 ISMLEWVAIPGSLVFAWLAVR-YPNKMAKVGAIGLFVLAAIVFVYERLTATGAPNYFLLL 354 

+ EW IPG+L+ W++ +++AG++++ VVY GP + 

Sbjct: 313 YFLYEWAGIPGTLLCGWISDKVFKGRRAPAGILFMVLVTLAVLVY-WFNPAGNPAVDMAA 371 

Query: 355 VIAGILGSLIYGPQLIVNILTINFVPLNVAGTAIGFVGVTAYLIGNMGANWLMPILADGF 414 

++A +G LIYGP +++ + + ' P AGTA GG+YLG + AN++ DF 
Sbjct: 372 LVA--IGFLIYGPVMLIGLYALEI1APKKAAGTAAGLTGLFGYLGGAVAANAILGYTVDHF 429 

Query: 415 GWFWSYIWAALSAFSAV 432 

GW ■1-+V+ A S + 
Sbjct: 430 GWDGGFMVLVASCVLSVL 447 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 117/439 (26%) , Positives = 203/439 (45%) , Gaps = 27/439 (6%) 

Query: 23 KYPRYRVQVLISIFVGYMGYYFVRNTTSILSGILNMS ATEIGIITCASYIAYGLSK 78 

++ R + V F GY+ Y VRN ++S + + +11+ ++YGL+K 

Sbjct: 17 QFLRRQKVVFFVAFFGYVOVYLVRMNFKLMSNTIMVQNGWDKAQIAILLSCLTVSYGLAK 76 
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Query: 


79 


FISGLISDESNSKIFLPVGLFLTC3LVNVLIGVIPSVITSIWLFTIMYLINGWLQGMGYPP 


138 






FG + D+ + +L+L+ +LIG S S+ + 1+ ++ G +QG P 




Sbj ct: 


77 


FYMGALGDRVSIiRKLFSISLGASftLICILIGFFNS- - -SMWLGILLVLCGWQGALAPA 


133 


Query: 


139 


GRRTLWWYDNKERIKYATIWNLSHNFGGAIAPI LTGVGIALAGNDSIiNQRRAAYW 


194 






+ ++ NK R WN+S N G A+ P+ LT +GL + N ++ A+ 




Sbjct: 


134 


SQAMIANyFENKTRGGAIAGWNISQNMGSALLPLTIALLTSMGLWPANGNI - - -LLAFL 


190 


Query: 


195 


FPGWACLLAVLVYFLQEDTPESIGLPPIEEYHKEQYTNWDSSDILEEPEVLGMGEIIK 


254 






PGV+ L A+ + L D PES GL + + + + V S EE L ++I 




Sbjct: 


191 


IPGVLVFLPALCCWKLGGDNPESEGLDSLRTMYGDMESAVaSE- - -EEKHNLSYWQLIW 


247 


Query: 


255 


KYILPOT'K]jymSLYSIFVyiIiRYGIVSm'PKFIATSVQIX3GKGITATAGMGGFSLPEIG 


314 






KY+ N L+ + ++ +Y +R+GI W P +L+ I S+ E 




Sbjct: 


24S 


KYWajPSLLLVRAVNVALYFVRFGlEDWMPIYLSQVANMSEftHIHFA ISMLEWV 


302 


Query: 


315 


GI IGMLTAGYLSAKVFKNSKPLTNVAFLWAILLLAAYWFI PAG- PQYMALDFI ILLG-L 


372 






I G L +L+ + + + V+A ++ G P Y L +++ G L 




Sbjct: 


303 


AIPGSLVFAWIAVRYPNKMaKVGaiGLFVLAAIVFVYERLTATGAPNYFLL--LVIAGIL 


360 


Query: 


373 


GASIYGPVMMVGLYAMELVPKaAAGAASGLTGTFSYVGGATIATLAIGIIIDHFGWGVAF 


432 






G+ lYGP ++V + + VP AG A G G +Y+ G A + 1+ D FGW ++ 




Sbj ct : 


361 


GSLIYGPQLlVNILTINFVPIMVAGTAIGFVGVTAYLIGmGANWLMPILRDGFGW™^ 


420 


Query: 


433 


IIF-GISGFAAIVCTLLSR 450 








1+ +S F+A+ +L++ 




Sbj ct : 


421 


IWAALSAFSAVGYLILAK 439 





Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2083 

A DNA sequence (GBSx2198) was identified in S.agalactiae <SEQ ID 6447> which encodes the amino 
acid sequence <SEQ ID 6448>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3202 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certaintyi=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6449> which encodes the amino acid 
sequence <SEQ ID 6450>. Analysis of this protein sequence reveals the foUovwng: 

Possible site: 19 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4473 (Affirmative) < suco 

bacterial membrane Certaintyi=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 54/100 (54%) , Positives = 67/100 (67%) 

Query: 1 MTYELCLEYQTYPLRPVDAWADEINTAPAFITEDKKLLEIjLEEVNTLFHELFLTIE^^ 60 

MTYELCLEYGTYPL VnA+ E P FI ED+ L LE +N LFH+LP+TIE FH 
Sbjct: 1 MTYELCLEYGTYPLSRVDAYWGEDQNPPTFIQEDRLLCHKLETMNHLFHDLFVTIESQFH 60 
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Query: 61 YIGHDFPEKRAKITQIYHVIIEHLSIHYPEYDIKIESLLM 100 

Y+G + PEKRA+I +Y + L Y +Y IKIE+ L+ 
Sbjct: 61 YVGENMPEKRAQIRILYQEVATILKSKYKDYPIKIETFLL 100 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2084 

A DNA sequence (GBSx2199) was identified in S.agalactiae <SEQ ID 645 1> which encodes the amino 
acid sequence <SEQ ID 6452>. Analysis of this protein sequence reveals the following: 
Possible site: 19 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2369 (Affirmative) < suco 

bacterial membrane — Certainty=0. COCO (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB81912 GB:U92974 unknown [Lactococcus lactis] 
Identities = 213/322 (66%) , Positives = 260/322 (80%) , Gaps = 5/322 (1%) 



Query: 


1 


MSEKIRVLLYYKYVSIENAEEYAAKHLEFCKSIGLKGRILIADEGINGTVSGDYETTQKY 


60 






M++ RVLLYY+YV IE+ E +A KHL CK +GLKGRIL+ADEGINGTVSG E T Y 




Sbjct: 


1 


MTQDYRVLLYYQYVPIETOETFAQKHLADCKEIfiLKGRILVZmEGINGTVSGTIEQTimY 


60 


Query: 


61 


MDWVHSDERFADLWFKIDEENQQAFRKMFVRYKKEIVHLGLEDNNFDSDINPLETTGEYL 


120 






M+ + +D RF+ FKIDE Q AF+KM VRY+ E+V+L LED D+NPLE TG YL 




Sbjct: 


61 


MELMKNDPRFSSTIFKIDEfiEQMAFKKMHVRYRPELVNLSLED DVNPLELTGAYL 


115 


Query: 


121 


NPKQFKBALLDEDTVVLDTRNDYEYDLGHFRGAIRPDIRNFEELPQWVRDNKDKFMEKRV 


ISO 






+PK+F+EA+LDE+TW+D RNDYE+DLGHFRGAIRP+IR+FRELPQW+RDNK++FMEKRV 




Sbjct: 


116 


DPKEFREAMLDENTWIDARNDYEFDLGHFRGAIRPEIRSFRELPQWIRDNKEQFMEKRV 


175 


Query: 


181 


VVYCTGGVRCEKFSGWMVREGFKDVGQLHGGIATYGKDPEVQGELWDGAMYVFDDRISVP 


240 






+ YCTGG+RCEKFSGW+VHEGFKDVGQL GGIATYGKDPEVQG+LWDG MYVFD RI+VP 




Sb j Ct : 


176 


LTYCTGGIRCEKFSGWLVREGFKDVGQLLGGIATYGKDPEVQGDLWDGQMYVEDSRIAVP 


235 


Query: 


241 


INHVNPTVISKDYFDGTPCERYVNCANPFCNKQIFASEENEAKYVRGCSPECRAHERNRY 


300 






IN ++ +D+FDG+PCERY+NC NP CN+Q+ ASEENEAKY+ CS ECR H NRY 




Sb j ct : 


236 


INQKEHVIVGRDWFDGSPCERYINCGNPECJffiQMLASEENEAKYLGACSHECRVHPNNRY 


295 


Query: 


301 


VQENGLSRQEWAERLEAIGESL 322 








++ + LS QE ERL + + L 




Sb j ct : 


296 


IKAHQLSNQEVQERIALLEKDL 317 





A related DNA sequence was identified in S.pyogenes <SEQ ID 6453> which encodes the amino acid 
sequence <SEQ ID 6454>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2443 (Affirmative) < suco 

bacterial membrane Certainty= 0.0000 (Not Clear) < suco 

bacterial outside — Certainty= 0.0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 321/324 (99%), Positives = 323/324 (99%) 
Query: 1 MSEKIRVLLYYKYVSIENftEEYAAKHLEFCKSIGLKGRILIADEGINGTVSGDYETTQKY 60 
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MSEKIRVLLYYKYVSIENA+EYAAKHLEFCKSIGLKGRIIiIflDEGINGTVSGDYETTQKY 
Sbjct : 1 MSEKIRVLLYyKyVSIENAQEYAAKHLEFCKSIGLKGRIIiIADEGINGTVSGDYETTQKY 60 

Query: 61 MDWVHSDERFMLWPKIDEENQQAFRKMFTOYKKEIVHLGLEDNNFDSDINPr.ETTGEYL 120 
5 ^roWVHSDERFADLWFKIDEENQQRFRKMFTOYKKEIVHLGLEDNNFDSDINPLETTGEYL 

Sbjct: 61 MDWVHSDERFADLWFKIDEENQQAFRKMFVRYKKEIVHLGLEDNNFDSDINPLETTGEYL 120 

Query: 121 NPKQFKEALLDEDTWLDTRNDYEYDLGHFRGAIRPDIRNFRELPQWVRDNKDKFMEKRV 180 
NPKQFKEALLDEDTWLDTRNDYEYDLGHFRGAIRPDIRNFRELPQWVRDNKDKFMEKRV 
10 Sbjat: 121 NPKQFKEALLDEDTWLDTRNDYEYDLGHFRGAIRPDIRNFRELPQWVRDNKDKFMEKRV 180 

Query: 181 VVYCTGGWCEKFSGWMVREGFKEJVGQLHGGIATYGKDPEVQGELWDGftMYVFDDRISVP 240 

WYCTGGVRCEKFSGWMVREGFKDVGQLHGGIATYGKDPEVQGELWDGZiMYVFDDRISVP 
Sbjct: 181 VVYCTGGVRCEKFSGWMVREGFKDVGQLHGGIATYGKDPEVQGELWDGAMYVFDDRISVP 240 

15 

Query: 241 INHVNPTVISKDYFDGTPCERYVNCANPFCNKQIFASEENEAKYVRGCSPECRAHERNRY 300 

INHVNPTVISKDYFDGTPCERYVNCftNPFCNKQIFASEENE KYVRGCSPECRAHERNRY 
Sbjct: 241 INHVNPTVISKDYFDGTPCERYWCMIPFaSIKQIFASEENETKYVRGCSPECRAHERNRY 300 

20 Query: 301 VQKNGLSRQEWJiERIBAlGESLPQ 324 

VQENGLSRQEWAERLEAIGESLP+ 
Sbjct: 301 VQENGLSRQEWAERLEAIGESLPE 324 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
25 vaccines or diagnostics. 

Example 2085 

A DNA sequence (GBSx2200) was identified in S.agalactiae <SEQ ID 6455> which encodes the amino 
acid sequence <SEQ ID 6456>. Analysis of this protein sequence reveals the following: 

Possible site: 57 
30 >» Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

35 bacterial cytoplasm dertaintyi=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



40 



>GP:AAC83954 GB:L47648 putative [Bacillus subtilis] 
Identities = 54/192 (28%) , Positives = 89/192 (46%) , Gaps = 14/192 (7%) 

Query: 5 QTIIIGAGAAGIGFGSAMQRLGLTNFLIIEKGHIGESFLRWPRTTQFITPSFTTNGFGFP 64 

+ IIIG G G+ ++++G+ + L+IEKjG++ S +P F + S 
Sbjct: 5 KAIIIGGGPCGLSAAIHLKQIGI-DALVIEKGNVVNSIYNYPTHQTFFSSSBKIiE 58 

45 Query: 65 DtNAVIPDTSPAFSFEKEHLSGVEYARYLQLVAAHYNLPIQNETSVLSIDK-RDSLPVIK 123 

IDAFE ++Y + V N+1- V+K +++ FVI+ 

Sbjct: 59 IGDV--AFITENRKPVRIQRLSYYREVVKRKNIRVNAPEMVRKOTKTOIOTW^ 111 

Query: 124 TSR3DFSADYLIMATGEFQNPNTIDIKGADLGMHYGQVDNFHIKSDNPFIIIGGNESACD 183 
50 TSK ++ Y I+ATG + +PN + + G DL + H D ++IGG S+ D 

Sbjct: 112 TSKETYTTPYCIIATGYYDHPNYMGVPGEDLPKVFHYFKEGHPYFDKDWVIGGKNSSVD ,171 

Query: 184 ALTHLVYLGNQV 195 
A LV G +V 
55 Sbjct: 172 AALELVKSGARV 183 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useftil antigens for 
vaccines or diagnostics. 
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A related GBS gene <SEQ ID 8973> and protein <SEQ ID 8974> were also identified. Analysis of 
protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 2 
McG: Discrim Score: 5.05 
GvH: Signal Score (-7.5): -3.14 

Possible site: 57 
>» Seems to have an uncleavable N-term signal seq 
ALOM program co\mt: 0 value: 0.26 threshold: 0.0 
PERIPHERAL Likelihood =0.26 6 
modified ALOM score: -0.55 



*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

33.2/56.1% over 281aa 

Bacillus subtilis 

EGAD| 109228 I hypothetical protein Insert characterized 

GP|2635109 |emb|CAB14605.l| |Z99117 alternate gene name: yrdP Insert characterized 
GP|l934657|gb|AAB80908.l| |U93876 hypothetical protein YrdP Insert characterized 
PIR|E69725 |E69725 potassium uptake trkA - Insert characterized 

ORF01799(310 - 1128 of 1725) 

EGAD 1 109228 1 S2656 (2 - 283 of 345) hypothetical protein 
GP|2635109leinb|CA 14605 . 1 1 | Z99117 alternate gene name: yrdP 
Gpj 1934657 lgb|AA 80908. l| |U93876 hypothetical protein YrdP 
PIR|E69725 |E69725 potassium uptake trkA - acillus subtilis 
%Match =6.1 

%Identity =33.2 %Similarity =56.0 

Matches = -77 Mismatches = 88 Conservative Sub.s = 53 



{ acillus subtilis 
{ acillus subtilis 
{ acillus subtilis 



270 300 330 360 390 417 444 474 

YYC*LVKYFILHIYFCQGEDMKHYQTIIIGAGAAGIGFGSAMQRLGLTNFLIIEKGH-IGESFL-RWPRTTQFITPSFTT 

I Ihllll III I |:|::| I :|||: 1= I = ==: 

MYDTIVIGAGQAGISIGYYLKQ-SDQKFIILDKSHEVGESWKDRYDSLVLFTSRMYSS 
10 20 30 40 50 

480 510 540 570 600 630 660 690 
NGFGFPDLNAVIPDTSPAFSFEKEHLSGVEYARYLQLVAAHYIJLPIQNETSVLSIDKRDSLFVIKTSKGDFS 

III I Ih : =111 I 1=1= I = Mlj:: == 

LPGMHLEGEKHGFPSKNEIV AYLKKYVKKFEIPIQLRTEVISVLKIKNYFLIKTNREEYQ 

70 80 90 100 110 

720 750 822 852 882 912 

ADYLIMATGEFQNPNTIDIKGADLG MHYGQVDNF-HIKSDNPFIIIGGNESACDALTHLVYLGNQVELYTDTFGR 



TKNLVIATGPFHTPNIPSIS-KDLSDNINQLHSSQYKNSKQLAYGNVLWGGGNSGA- 
130 140 150 160 170 



942 969 996 1026 

KESNPDPSISLS-PLTKERLKHIQ-DHKKEYYSISEGKKAI--EIKQIG 

:: |:|||: :: :| |: : ||::| ::| 

QIAVELSKERVTYLACSNKLVYFPLMIGKRSIFWWFDKLGVLHASHTSIVGKFIQKKGDPVFGHELKHAIK 

180 190 200 210 220 230 240 



1068 1098 1128 1158 1188 1218 1248 

KQYQVTFDDGSTAESFHKPILSTGFLNTCHLIDGIALFEYDKNQLPIVTEDDESTIVNNCFLIGPSL 

II := II II I = 1 :|ll 1 |: ::: : : : : 

QKEIILKKRVIAAKQNEIIFKDSSTIB-VNNIIWATGFRHPLCWINIKGVLDQEGRIIHHRGVSPVEGLYFIGLPWQHKR 
260 270 280 290 300 310 320 
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SEQ ID 8974 (GBS284) was expressed in E.coli as a His-fiision product. SDS-PAGE analysis of total cell 
extract is shown in Figure 52 (lane 10; MW 42.7kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 58 (lane 9; MW 67.6kDa). 

GBS284-GST was purified as shown in Figure 225, lane 7. 
5 Example 2086 

A DNA sequence (GBSx2201) was identified in S.agalactiae <SEQ ID 6457> which encodes the amino 
acid sequence <SEQ ID 6458>. This protein is predicted to be NrgA-like protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 24 
10 >>> Seems to have an uncleavable N-term signal seq 
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Results 





















bacterial tciettbrane Certainty=0. 5692 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certaintyi=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9997> which encodes amino acid sequence <SEQ ID 9998> 
was also identified. 

The protein has homology with the following sequences in the GENFEPT database. 

>GP:CAB15668 GB:Z99122 ammonium transporter [Bacillus subtilis] 
30 Identities = 105/378 (27%) , Positives = 181/378 (47%) , Gaps = 41/378 (10%) 

Query: 3 VKKGLFVFLLLCILSMWLMIFGVAFYYFGSLH-QSLTSRIIYQFVLTVLLTTTAWBMGAY 61 

++ G VF+ C L +WU1 G+A +Y G + +++ S ++F ++ + + W + Y 
Sbjct: 1 MQMGDTVFMFFCALLVWLOTPGLALFYGGMVKSKNVLSTAMHSFS-SIAIVSIVl^ 59 

35 

(Juery: 62 FLAFEGHFKTVFQFQEADGKQI VNCLFQLCFALYAWMLIGSIIDR 107 

LAF ■ + + A K + + +FQ+ FA+ ++ G+ +R 

Sbjct: 60 TLAFAPGNSIIGGLEWAGLKGVGFDPGDYSDTIPHSLFMMFQMTFAVLTTAIISGAFAER 119 

40 Query: 108 VQTKRLLLAWSWLFLVYTPLAYLIWNSEGVFAKMGVLDFSGGMIVHLSAGLSSYILAHV 167 

++ LL V W LVYTP+A+ +W G ++G LDF+GG +VH+S+G++ +LA V 
Sbjct: 120 ^D^FGAFLLFSVL^^ASLVYTPVAHWVWGG-GWIGQLGALDFAGGNVVHISSGVAGLVLAIV 178 

Query: 168 IGK SEHQHNKVKNDSLFLGMILITFGWFGFNMGPVGEWNSQAIMILLNTIFAIIG 222 

45 +GK + HN + FLG LI FGWFGFN+G + A+ +NT A 

Sbjct: 179 LGKRKDGTASSPHNLIYT FLGGALIWFGWFGFNVGSALTLDGVAMYAFINTNTAAAA 235 

Query: 223 GGLAVWLAAKWNGEEEKTGSLXJSGIIVGLVTSTAGVGYLLTWQLLAVTFFASLFTYFVTD 282 
G W L ++ ++G I GLV T G++ + + + ++ 

50 ■ Sbjct: 236 GIAGWILVEWIINKKPTMLGAVSGAIAGLVAITPAAGFVTPFASIIIGIIGGAVCFWGVF 295 

Query: 283 YVAKAFAIDDWSSFGMNGIGGLLGSLGVGLFKLSHMP VQLLAL 326 

+ K F DD + +FG++GIGG G + GLF + + Q++A+ 
Sbjct: 296 SLKKKFGYDDALDAFGLHGIGGTWGGIATGLFATTSVHSAGADGLFYGDASLIWKQIVAI 355 



Query: 327 ATTILLSIIMTYIISKAI 344 

A T + I+T++I K + 
Sbjct: 356 AATYVFVFIVTFVIIKIV 373 
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No corresponding DNA sequence was identified in S.pyogenes. 



A related GBS gene <SEQ ID 8975> and protein <SEQ ID 8976> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 4 



GvH: Signal Score (-7.5): -4.07 

Possible site: 24 
>» Seems to have an tincleavable N-term signal seq 
ALOM program count: 9 value: -11.73 threshold: 0.0 
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modified ALOM score: 2.85 



*** Reasoning Step: 3 

Final Results 

bacterial tnenibrane — Certainty=0. 5692 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF01800{307 - 1332 of 1641) 

EGAD| 19589 |BS3646 (1 - 373 of 404) probable ammonium transporter {Bacillus subtilis} 
OMNI |NT01BS4254 ammonium transporter SP|Q07429|NRGA_BACSU PROBABLE AMMONIUM TRiUSISPORTER 
(MEMBRANE PROTEIN NRGA) . GP | 143264 | gb |AAA17399 . 1 | | L03216 membrane-associated protein 
{Bacillus subtilis} GP 1 1684645 | emb | CAB05374 . l| | Z82987 unknown {Bacillus subtilis} 
GP|2636176|emblcaB15668.l| |Z99122 ammonium transporter {Bacillus subtilis} 

PIR|A36865 |A36865 ammonium transporter nrgA - Bacillus sidDtilis 
%Match =13.5 

%Identity =30.0 %Sxmilarity =54.3 

Matches = 104 Mismatches = 149 Conservative Sub.s = 86 

144 174 204 234 264 294 324 354 

PFSMIRKFVSPNRCMftEPKBIPAAPAPIXMV**CFMSSP*QK*MCKIK:YLTS*Q*YSLTNKRVFVKKGLFVFLLLCILSM 



McG: Discrim Score; 



17 .19 



MQMGDTVFMFFCALLV 
■ 10 



384 411 441 471 501 531 

WLMIFGVAFYYFGSLH-QSLTSRIIYQFVLTVLLTTTAWFMGAYFLAFEGHFKTVFQFQEADGKQI 




30 40 50 60 70 80 90 



579 609 639 669 699 729 759 789 

WCLFQLCFALYAVVMLIGSIIDRVQTKRLLLAWSVn^FLVYTPLAYLIWNSEGVEaKMGVLDFSGGMIV^ 



LFMMFQMTFAVLTTAIISGAFAERMRFGAFLLFSVLWASLVYTPVAHWVWGG-GWIGQLGALDFAGGIJVVHISSGVAGLV 
110 120 130 140 150 160 170 




819 849 873 903 933 963 993 1023 

LAHVIGKSEHQHNKWNDSLF--LGMILITFGWFGFNMGPVGEV™SQAIMILIOTIFAIIGGGLAmijAftK™ 



M I H I ■ .... I I I I I - I • I • • • 1 1 I I II -I 

lAIVLGKRKDGTASSPHNLIYTFLGGRLIWFGWFGBWGSALTIDGVAMYJ^INTNTAAAAGIAGWIL-VEWIINI^^ 
190 200 210 220 230 240 250 
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1050 1080 1110 1140 1170 . 1200 1230 1260 

-SLLMGIIVGLOTSTAGVGYLLTWQLIAWFFASLFTYFVTDYXAKAFAIDDWSSRSMNGIG^^ 

:: I I I I I I I : : : : : : : | I I I = = I 1 = : I I I i I : | | | = : 

LGAVSGAIAGLVAITPAAGFVTPFASIIIGIIGGAVCFWGVFSLKKKFGYDDALbAFGLHGIGGTWGGIATGLFATTSVN 
270 280 290 300 310 320 330 

1272 1302 1332 1362 1392 1422 1452 

V QLIAIATTILLSIIMTYIISKAIFRK**IRLRCTSQPYLLP*QGE*LISIRIINHFHY*TLSXX* 



350 360 370 380 390 400 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2087 

A DNA sequence (GBSx2202) was identified in S.agalactiae <SEQ ID 6459> which encodes the amino 
acid sequence <SEQ ID 6460>. This protein is predicted to be dUTPase (dut). Analysis of this protein 
sequence reveals the following: 

Possible site: 51 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2731 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9471> which encodes amino acid sequence <SEQ ID 9472> 
was also identified. 

The protein has homology Avith the following sequences in the GENPEPT database. 

>GP:CAA72644 GB:Y11901 dUTPase [Lactococcus lactis] 
Identities = 67/144 (46%) , Positives = 90/144 (61%) , Gaps = 8/144 (5%) 

Query: 40 RGFELVSQFSNKELnPKRETAHARGYDLKVAKKTVIEPGEITLVPTGIKAYMQPGEVLYL 99 

RGF+ + +P+R T H+AGYD+ ++ I+P EI +V T6+ + EVL L 

Sbjct: 3 RGFK---KLDGN&TIPERATKHSaGYDISASETVTIQPDEIKMVSTGLAVQL6DDEVLKL 59 

Query: 100 YDRSSNPRKKGIVLINSVGVIDGDYYIWrQVNEGHIFAQMQNITDQAVILEEGERIVQAVF 159 
YDRSSNP K+GI LINSVG+ID DYY + NI+ + V + +G+RI+Q VF 

, Sbjct: 60 YDRSSNPVKRGIALINSVGIIDSDYYPQEFK GLFMNISKEPVTISKBQRIMQGVF 114 

Query: 160 APFLLRDDDQATGMRTGGFGSTGK 183 

+L DDD A G RTGGFGSTG+ 
Sbjct: 115 VKYLTIDDDNANGKRTGGFGSTGE 138 

A related DNA sequence was identified in S.pyogenes <SEQ ID 646 1> which encodes the amino acid 
sequence <SEQ ID 6462>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certaintyi=0 .2519 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 
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Identities = 115/148 (77%) , Positives = 125/148 (83%) 

Query: 36 MSKVRGFELVSQFSNKELLPKRETAHAAGYDLKVAKKTVIEPGEITLVPTGIKAYMQPGE 95 

M+K+RGFELVS F+N +LLPKRET HAAGYDL VA+ I PGEI LVPTG+KAYMQ GE 
Sbjct: 1 MTKIRGFELVSSFTNPDLLPKRETTHftAGYDLSVaEZWTIAPGEIKLVPTGVKAYMQDGE 60 

Query: 96 VLYLYDRSSNPRKKGIVLINSVGVIDGDYYNWQVNEGHIFAQMQNITDQAVILEEGERIV 155 

VLYLYDRSSNPRKKGI+LINSVGVID DYY N+ NEGHIFAQMQNITD V L GERIV 
Sbjct: 61 VLYLYDRSSNPRKKGIILINSVGVIDADYYGNETOIEGHIFAQMQNITDHPVTLAVGERIV 120 

Query: 156 QAVFAPFLLADDDQATGMRTGGFGSTGK 183 

Q VF PFL+AD DQA G RTGGFGSTG+ 
Sbjct: 121 QGVFMPFLIADGDQARGERTGGFGSTGQ 148 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2088 

A DNA sequence (GBSx2203) was identified in S.agalactiae <SEQ ID 6463> which encodes the amino 
acid sequence <SEQ ID 6464>. This protein is predicted to be RadA homolog (radA). Analysis of this 
protein sequence reveals the following: 

Possible site: 34 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certaintys=0. 2628 (Affirmative) < suco 

bacterial membrane Certaintys:0 . 0000 (Not Clear) < suco 

bacterial outside Certaintj^O . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11863 GB:Z99104 DNA repair protein homolog [Bacillus subtilis] 
Identities = 285/453 (62%) , Positives = 358/453 (78%) , Gaps = 4/453 (0%) 



Query: 


1 


MAKKKSVFTCQEaSYQSPKYLGRCPNCSAWSSFVEEVEVQEVKNRRVSLNGEKSRPTKLK 


60 






MAK KS F CQ CGY+SPK++G+CP C AW++ VEE+ + N R + + K 




Sb j ct : 


1 


MAKTKSKFICQSCGYESPKWMGKCPGCGAWNTMVEEMIKKAPAlffiRARFSHSVQTVQKPS 


60 


Query: 


61 


DVSSINYS- - -RTKTDMDEENRVLGGGWPGSLVLIGGDPGI6KSTLLLQVSTQLA-NKG 


116 






++SI S R KT + EFNRVLGGGW GSLVLIGGDPGIGKSTLLLQVS QL+ + 




Sbjct: 


61 


PITSIETSEEPRVKTQLGEENRVLGGGWRGSriVLIGGDPGIGKSTLLLQVSAQLSGSSN 


120 


Query: 


117 


TVLYVSGEESAEQIKLRSERLGDIDNEFYLYAETNMQSIRSEIEKIKPDFLIIDSIQTIM 


176 






+VLY+SGEES +Q KLR++RLG + ++ +ET+M+ I S P F+++DSIQT+ 




Sbjct: 


121 


SVLYISGEESVKQTKLRADRLGIimPSLHVLSETDlffiYISSAIQEMNPSFVVTO 


180 


Query: 


177 


SPEVSSVQGSVSQVREVTAEMQLAKTimiATPIVGHVTKEGTLRGPRMLEHMVDTVLYF 


236 






+++S GSVSQVRE TAELM++ARr I FIVGHVTKEG++AGPR+LEHMVDTVLYF 




Sb j ct : 


181 


QSDITSAPGSVSQVRECTAE1:jMKIAKTKGIPIFIVGHVTKEGSIAGPRIJ1.EHMVDTVLYF 


240 


Query: 


237 


EGERHHTFRILRAVKNRFGSTNEIGIFEMQSGGLVEVLNPSQVFLEERLDGATGSAIWT 


296 






EGERHHTFRILRAVKNRFGSTNE+GIFEM+ GL E\™pS++FLEER G+ GS+I + 




Sb j ct : 


241 


EGERHHTFRILRAVKNRFGSTNEMGIFEMREEGLTEVLNPSEIFLEERSAGSAGSSITAS 


300 


Query: 


297 


megtrpilaevqalvtptvfgkakrtttgldenrvslimavlekrcglllqnqdaylksa 


356 






MEGTRPIL E+QAL++PT PGN +R TG+D NRVSIi+MAVLEKR GLLLQNQDAYLK A 




Sbjct: 


301 


MEGTRPILVEIQALISPTSFGNPRRMATGIDHNRVSLLMAVLEKRVGLIiLQNQDAYLKWA 


360 


Query: 


357 


GGVKLDEPAIDLAVAVAIASSYKEKPTNPQESFIGEIGLTGEIRRVTRIEQRINEASKLG 


416 






GGVKIiDEPAlDLA+ ++IASS+++ P NP + FIGE+GLTGE+RRV+RIEQR+ EA+KLG 




Sbjct: 


361 


GGVKLDEPAIDLAIVISIASSFRDTPPNPADCFIGEVGLTGEVRRVSRIEQRVKEAAKLG 


420 


Query: 


417 


FTKIYAPKNSLAGIEIPKGIDVIGVTTVSQVLK 449 
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F ++ P +Ij G PKGI+VIGV V++ L+ 
• -Sbjct: 421 FKRMIIPAANLDGWTJCPKGIEVXGVaNVaEALR 453 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6465> which encodes the amino acid 
sequence <SEQ ID 6466>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2191 (Affirmative) < suco 

bacterial tnenibrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and. GBS proteins is shown below. 

Identities = 416/453 (91%) , Positives = 441/453 (96%) 



Query. 


1 


MAKKKSVFTCQECGYQSPKYLGRCPNCSAWSSFVEEVEVQEVKNRRVSIiNGEKSRPTiajK 


60 






MAKKK+ F CQECGYQSPKYLGRCPNCSAWSSFVEEVEV+EVKNARVSL GEKSRP KLK 




Sbjct: 


1 


MAKKKATFICQECGYQSPKYLGRCPNCSAWSSFVEEVEVKEVKNARVSLAGEKSRPVKLK 


60 


Query: 


61 


DVSSINYSRTKTDMDEFNRVLGGGWPGSLVLIGGDPGIGKSTLLLQVSTQLANKGTVLY 


120 






DV +I+Y RT+TDM EFNRVLGGGWPGSL+LIGGDPGIGKSTLLLQVSTQljaNKGTVLY 




Sbjct: 


61 


DVDNISYHRTQTDMSEFmVLGGGWPGSLILIGGDPGIGKSTLLLQVSTQLfiNRGTVLY 


120 


Query: 


121 


VSGEESAEQIKLRSERLGDIDNEFYLYAETNMQSIRSEIEKIKPDFLIIDSIQTIMSPEV 


180 






VSGEESAEQIKLRSERLGDIDNEFYLYAETNMQ+IR+EIE IKPDFLI1DSIQTIMSP++ 




Sb j ot : 


121 


VSGEESiffiQIKLRSERIfiDIDKEFYLYAETNMQAIRTEIENIKPDFLIIDSIQTIMSPDI 


180 


Query: 


181 


SSVQGSVSQVREVTAELMQrAKIOTIATFIVGHVTKEGTLAGPRMLEHMVDTVLYFEGER 


240 






+ VQGSVSQVREVTAELMQLAKimiATFIVGHVTKEGTLAGPRMLEHMVDTVLYFEGER 




Sb j ct : 


181 


TGVQGSVSQVREVTAEIMQLAIOmiATFIVGHVTKEGTLftfiPRMLEHMVDTVLYFEG 


240 


Query: 


241 


HHTFRII^VKNRFGSTlSEIGIFEMQSGGnVEVLNPSQWLEERLDGATGSAIWrMEGT 


300 






HHTPRILRAVKNRFGSTNEIGIFEMQSGGLVEVLNPSQVPLEERLDGATGSA+WTMEG+ 




Sbjct: 


241 


HHTFRILRAVKNRFGSTNEIGIFEMQSGGLVEVLNPSQVFLEERLDGATGSAVWTMEGS 


300 


Query: 


301 


RPILAEVQALVTPTVFGNAKRTTTGLDFNRVSLIMAVLEKRCGLLLQNQDAYLKSAGGVK 


360 






RPIIiftEVQ+LVTPTVFGNA+RTTTGLDFNRVSIiIMAVLEKRCGLLLQNQDAYLKSAGGVK 




Sbjct: 


301 


RPII^VQSLVTPTVFGaSBUlRTTTGLDFNRVSLIMAVLEKRCGLIiGNQDAYLKSAGGV 


360 


Query: 


361 


LDEPAIDLAVAVAIASSYKEKPTOPQESFIGEIGLTGEIRRVTRIEQRINEASKLGFTKI 


420 






LDEPAIDLAVaV&IASSYKEKPT+PQE+F+GEIGLTGEIRRVTRIEQRINEA+KI,GFTK+ 




Sbjct: 


361 


IiDEPAIDIA\«WAIASSYKEKPTSPQEAFIX5EIGLTGEIRROTRIEQRINEAAKLGFTKV 


420 


Query: 


421 


YAPKNSLAGIEIPKGIDVIGVTTVSQVLKAVFS 453 








YAPKN+L GI+IP+GI+V+GVTTV QVL AVPS 




Sbjct: 


421 


YAPKNALQGIDIPQGIEVVGVTIT7GQVIJIIAVFS 453 





Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2089 

A DNA sequence (GBSx2204) was identified in S.agalactiae <SEQ ID 6467> which encodes the amino 
acid sequence <SEQ ID 6468>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

»> Seems to have no N-tertninal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3488 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:C2yV97750 GB:Z73419 hypothetical protein Rvl284 [Mycobacterium 

tiiberculosis] 

Identities = 69/162 (42%) , Positives = 100/162 (61%) , Gaps = 2/162 (1%) 

Query: 3 TYFDNFLKTNQAYADLHGTMLPIKPKTKAfAIVTCMDSRLHVAQALGLaLGDiffllL^ 62 

T D++I1 N YA LP+ P +AIV CMD+RL V + LG+ G+aH++HNaG 

Sbjct: 2 TVTDDYLAMIVDYASGF-KGPLPMPPSKHIAIVACMDARLDVYRMLGIKEGE 60 

Query: 63 GRVTDDVLRSLVISQQQLGTREIWLHHTDCGAQTFTNEAFAAQLQRDLGVDMHGHDFLP 122 

VTDDV+RSL ISQ+ LGTREI++LHHTDCG TFT++ F +Q + G+ 
Sbjct: 61 CWrDDVIRSLAISQRLLGTREIILLHHTDCX3MLTFTDDDFKRAIQDETGIRPTWSP-ES 119 

15 Query: 123 FNDIEESVREDVaKLHASPLIPDDWISGAIYDVDTGRMVEV 164 

+ D E VR+ + ++ +P + + G ++DV TG++ EV 

Sbjct: 120 YPDAVEDVRQSLRRIEVNPFVTKHTSLRGFVFDVATGKLNEV 161 

There is also homology to SEQ ID 6470: 

20 Identities = 126/164 (76%) , Positives = 146/164 (88%) 

Query: 1 MTTYETWPLKaMQAYADUIGTAHLPIKPKTKWAIVTCMJSRLHVAQRLGL^ 60 

+ +YF++F+ NQAY liHGTAHLP+KPKTKVRlVTCMDSRLHVAQALGLALGDMILFN 
Sbjct: 1 LMSYFEHFMaflNQAYVALHGTAHLPLKPKTKVAIOTCMDSRLHVaQALGrJUKSI^ 60 

25 

Query: 61 AGGRVTDDVLRSLVISQQQLGTREIWLHHTDCGAQTFTNEAFAAQLQRDLGVDMHGHDF 120 

AGGRVT+D++RSLVISQQQ+GTREIWLHHTDCGAQTFraE FA + LGVD+ G DF 
Sbjct: 61 AGGRVTEDMIRSLVISQQQMGTREIVVLHHTDCXaQTFTNEGFAKHIHEHLGVDVSGQDF 120 

30 Query: 121 LPFNDIEESVREDVAKLHASPLIPDDWISGAIYDVDTGRMVEV 164 

LPF D+E+SVRED+AK+ AS LI DDWI+GA+YDVDTG+M +V 
Sbjct: 121 LPFQDVEDSVREDMAKIRASSLISDDWINGAVYDVDTGKMTQV 164 

Based on this analysis, it was predicted that this protein and its epitopes, could be useM antigens for 
35 vaccines or diagnostics. 

Example 2090 

A DNA sequence (GBSx2205) was identified in S.agalactiae <SEQ ID 6471> which encodes the amino 
acid sequence <SEQ ID 6472>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
40 >» Seems to have no N-tertninal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0536 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9473> which encodes amino acid sequence <SEQ ID 9474> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

50 >GP:AAC73407 GB:AE000137 putative oxidoreductase [Escherichia coli K12] 

Identities = 199/438 (45%) , Positives = 286/438 (64%) 

Query: 1 MKKYDVIVLGFGKAGKTIiRAKIiATQGKSVAMVEEDDKMYGGTCINIGCIPTKTLLVSASK 60 
M KY +++GFGKAGKTI1A lA G VA++E+ + MYGGTCINIGCIPTKTL+ A + 
55 Sbjct: 10 MNKYQAVIIGFGKAGKTIAVTIAKAGWRVaLIEQSNftMYGGTCINIGCIPTKTLV^ 69 



Query: 61 iraDFQEaMTTRlffiVTSRr.RAKNFAMLDNKDTVDVYNAKARFISNKVVELTGGaDKQELTA 120 
+ DF A+ +NEV + LR KNF L + +DV + +A FI+N + + E+ 
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Sbjct: 70 HTDFVRAIQRKlffiVVNFLRNKNFHNIJmMPNIDVIDGCjaEFIimHSLRVHRPEG 129 

Query: 121 DVIIINTGAKSVQLPIPGLftDSQHVYDSTAlQEIiftHLPKRLGIIGGGNIGLEFATLySEL 180 

+ I INTC3A++V PIPG+ + VTOST + L LP LGI+GGG IG+EFA++++ 
Sbjct: 130 EKIFINTGAQTVVPPIPGITTTPGVYDSTGLIiNLKELPGHIKSILGGGYIGVEFASM 189 

Query: 181 GSKVTVIDSQSRIFAREEEELSEMAQDYLEEMGISFKLSflDIKSVQNEDEDWISFEDEK 240 

GSKVT++++ S RE+ ++++ L + G+ Ii+A ++ + + + V+ E + 

Sbjct: 190 GSKVTILEiaASLFLPREDRDIAmiATIIiia)QGVDIIIiNaHVERISHHENQVQVHSEH^^ 249 

Query: 241 LSFDAVLYATGRKENTEGLA]MqTDIKLTERGAIAVDEYCQTSVENIPA^raDTO 300 

L+ DA+L A+GR+P T L EN I + ERGAI VD+ T+ +NI+A+GDV GG QFT 
Sbjct: 250 LAVDALLIASGRQPATASLHPElOiGIAVNERGAIVVDKRIflTTADNIWAMGDVTGGLQFT 309 

15 Query: 301 YISLDDSRIVLNYLNCDKDYSLKNRGAVPTSTFTNPPIATVGLDEKTAKEKGYQVKSNSL 360 

YISLDD RIV + L + S +R VP S F PPL+ VG+ E+ A+E G ++ +L 
Sbjct: 310 YISLDDYRIVRDELLGEGKRSTDDRKNVPYSVFMTPPLSRVGMTEEQARESGADIQWTL 369 

Query: 361 LVSAMPRAHVMOTDLRGIFKVVVDTEimill^SSUlLPGAESHELINIITMAMDN^ 420 
20 V+A+PRA V ND RG+ K +VD +T +LGA L +SHE+INI+ M MD +PY+ + 

Sbjct: 370 PVAAIPRARVMlTOTRGVLKAIVnim^RMIfiASLLCVDSHEMINIVKI^^ 429 

Query: 421 KQIFTHPTMVENENDLEN 438 
QIFTHP+M E+ m3LF+ 
25 Sbjct: 430 DQIFTHPSMSESENDLFS 447 

There is also homology to SEQ ID 1820. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

30 Example 2091 

A DNA sequence (GBSx2206) was identified in S.agalactiae <SEQ ID 6473> which encodes the amino 
. acid sequence <SEQ ID 6474>. This protein is predicted to be glutamyl-tRNA synthetase (gltX). Analysis 
of this protein sequence reveals the following: 

Possible site: 43 
35 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2245 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9475> which encodes amino acid sequence <SEQ ID 9476> 
was also identified. A further related GBS nucleic acid sequence <SEQ ID 10953> which encodes amino 
acid sequence <SEQ ID 10954> was also identified. 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC31971 GB:U49789 glutamyl-tRNA synthetase [Bacillus subtilis] 
Identities = 273/491 (55%) , Positives = 353/491 (71%) , Gaps = 19/491 (3%) 

Query: 20 LANKIRVRYAPSPTGLLHIGNARTALENYLYARHHGGDFVIRIEDTDRKRHVEDGERSQL 79 
50 + N++RVRYAPSPTG LHIGNARTALENYL+AR+ GG P+IR+EDTD+KR++E GE+SQL 

Sbjct: 1 MCaSEVRVRYAPSPTGHLHIGNARTALFNYLFARNQGGKFIIRVEDTDKKRNIEGGEQSQL 60 

Query: 80 ENLRWLGMDWDESPET HENYRQSERLELYQRYIDQLLAEGKAYKSYVTEEELAAERE 136 

L+WLG+DWDES + + YRQSER ++Y+ Y ++LL +G AYK Y TEEEL ERE 
55 Sbjct: 61 NYLKmX3IDWDESVDVGGEYGPYRQSERNDIYiCVYYEEIiLEKGLAYKCYCTEEELEKERE 120 

Query: 137 RQEIAGETPRYINEFIGMSETEKEAYIAEREAAGIIPTVRLAVNESGIYK!m)|ymCGD 196 
Q <3E PRY + +++ E+E +IAE G P++R V E + + D+VKG+I 
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Sbjct: 121 EQIARGEMPRYSGKHRDLTQEEQEKFIAE GRKPSIRFRVPEGKVIAENDIVKGEIS 176 

Query: .197 FEGSNIGGDWIQKTOGyPTyNFAWIDDHDMQISHVIRGDDHIAWrPKQLMVYEALGWE 256 

FE IG D+VI KKDG PTYNFAV IDD+ M+++HV+RG+DHI+NTPKQ+iyi+Y+A GW+ 
Sbjct: 177 FESDGIG-DFVIVKKDGTPTYNFAVAIDDYLMKMTHVLRGEDHISNTPKQIMIYQAFGWD 235 

Query: 257 APQFGHMTLIINSETGKKLSKRDTOTI^JFIEDYRKKGYMSEAVENFIALLGWNPGGEEEI 316 

PQFGHMTLI+N E+ KKLSKRD + +QPIE Y++ GY+ EA+FNFI LLGW+P GEEE+ 
Sbjct: 236 IPQFGHMTLIVN-ESRKKLSKRDESIIQFIEQYKELGYLPEftLFNFIGLLGWSPVGEEEL 294 

Query: 317 FSREQLINLFDENRLSKSPAAFDQKKMDWMSNDYLKNADFESVFALCKPFLEEAGRL--- 373 

F++EQ I +FD NRLSKSPA FD K+ W++N Y+K D + V L P L++AG++ 
Sbjct: 295 FTKEQFIEIFDVNRLSKSPALFDimKLKWVimQYVKKLDLDQV^ 354 

15 Query: 374 TDKREKLVELYQPQLKSADEIVPLTDLFFJmFPELTEAEKEVMARETVPTVLSAF 428 

+ KL+ LY QL EIV LTDLFF D E + K V+ E VP VLS F 
Sbjct: 355 LSAEEQEWVRKLISLYHEQLSYGAEIVELTDLFFTDEIEYNQEAKAVLEEEQVPEVLSTF 414 

Query: 429 KEKLVSLSDEEFTRDTIFPQIKAVQKETGIKGKNLFMPIRIAVSGEMHGPELPDTIYLLG 488 
20 KL L EEFT D I IKAVQKETG KGK LFMPIR+AV+G+ HGPELP +1 L+G 

Sbjct: 415 AAKLEEL--EEFTPDNIKASIKAVQKETGHRGKKLFMPIRVAVTGQTHGPELPQSIELIG 472 

Query: 489 KEKSVQHIDNM 499 
KE ++Q + N+ 
25 Sbjct: 473 KETAIQRLKNI 483 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6475> which encodes the amino acid 
sequence <SEQ ID 6476>. Analysis of this protein sequence reveals the following: 

Possible site: 24 
30 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1966 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0. GOOD (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 434/481 (90%) , Positives = 459/481 (95%) 

40 Query: 20 LANKIRVRYAPSPTGLLHIGNARTALFNYLYARHHGGDFVIRIEDTDRKRHVEDGERSQL 79 

++ IRVRYAPSPTGLLHIGKARTALENYLYAR HGG F+IRIEDTDRKRHVEDGERSQL 
Sbjct: 1 MSKPIRVRYAPSPTGLLHIGNARTALENYLYARRHGGTFIIRIEDTDRKRHVEDGERSQL 60 

Query: 80 ENLRWLGrTOWDESPETHEirrRQSERLELYQRYIDQLLftEGKaYKSYVTEEELaAERERQE 139 

45 ENL+WLGMDWDESPETHENYRQSERL LYQ+YIDQLLAEGKAYKSYVTEEELAAERERQE 

Sbjct: 61 ENLKWLGMDWDESPETHENYRQSERLALYQQYIDQLLAEGKAYKSYVTEEELAAERERQE 120 

Query: 140 LAGETPRYIIffiFIGMSETEKEAYIAEREAAGIIPTVRLAVNESGIYKWTDMVKBDIEFEG 199 
AGETPRYINEFIGMS EK YIAEREftAGI+PTVRIAVNESGIYKWTDMVKBDIEFEG 
50 Sbjct: 121 AAGETPRYIJTOFIGMSADEKRKYIAEREAaGIVPTVRLAVlffiSGIYKJmjM^ 180 

Query: 200 SNIGGDWVIQKKDGYPTYNFAWIDDHDMQISHVIRGDDHIANTPKQLMVYEALGWERPQ 259 

NIGGDWIQKKDGYETYHFAVV+DDHIMQISHVIRGDDHIAm'PKQLMVYEALGWEAP+ 
Sbjct: 181 GNIGGDWVIQKKDGYPTYNFAVVVDDHDMQISHVIRGDDHIANTPKQLMVYEALGWEAPE 240 

55 

Query: 260 FGHMTLIINSETGKKLSKRDTimjQFIEDYRKKGYMSEAVENFIALLGW^^ 319 

FGHMTLI INSETGKKLSKRDTNTLQFIEDYRKKGYM EAVFNFIALLGWNPGGEEEIFSR 
Sbjct: 241 FGHMTLIINSETGKJO^SKRDTimQFIEDYRKKGYMPEAVFNFIALLGWNPGGEEEIFSR 300 

60 Query: 320 EQLINLFDENRLSKSPAAFDQKKMDWMSNDYLKNADFESVFALCKPFLEEAGRLTDKAEK 379 

EQLI LFDENRLSKSPAAFDQKKMDWMSN+YLK+ADFE+V+ALCKPFLEEAGRLT+KAEK 
Sbjct: 301 EQLIALFDENRLSKSPAAFDQKKMDWMSNEYLKHADFETVYALCKPFLEEAGRLTEKAEK 360 

Query: 380 LVELYQPQLKSADEIVPLTDLFFADFPELTEAEKEVMAAETVPTVLSAFKEKLVSLSDEE 439 
65 LVELY+PQLKSADEI+PLTDLFF+DFPELTEftEKEVMA ETV TVL AFK KL ++SDE+ 



wo 02/34771 



PCT/GBOl/04789 



-2363- 

Sbjct: 361 LVELYKPQLKSADEIIPLTDLFFSDFPELTEAEKEVMAGETVSTVLQAFKAKLEAMSDED 420 

Query: 440 FTRDTIFPQIKAVQKETGIKGKKLFMPIRIAVSGEMHGPELPDTIYLLGKEKSVQHIDNMI, 500 
F + IFPQIKAVQKETGIKGKMLPMPIRIAVSGEMHGPELP+TIYLLG++KS++HI NML 
5 Sbjct: 421 FKPENIFPQIKAVQKETGIKGKNLFMPIRIAVSGE^ffiGPELPNTIYLI,GRDKSIEHIKNML 481 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2092 

10 A DNA sequence (GBSx2207) was identified in S.agalactiae <SEQ ID 6477> which encodes the amino 

acid sequence <SEQ ID 6478>. This protein is predicted to be d-ribose-biading protein precursor , fragment 

(rbsB). Analysis of this protein sequence reveals the following: 

Possible site: 24 

»> May be a lipoprotein 

15 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certaintyi=o. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

20 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CRB15613 GB:Z99122 ribose ABC transporter (ribose -binding 
protein) [Bacillus subtilis] 
Identities = 143/301 (47%) , Positives = 205/301 (67%) , Gaps = 1/301 (0%) 

25 



Query: 


14 


MSIVLILGACGKTGLGNSSGNSTKNVTKKSAKDLKLGVSISTTNNPYFVAMKDGIDKYAS 


73 






+S++L L T K + K+ +G+S+ST NNP+FV++K GI+K A 




Sbjct: 


5 


VSVILTLSLFLLTACSIiEPPQWAKPSNSGNKKEFTIGLSVSTLNNPFFVSLKKGIEKEAK 


64 


Query: 


74 


NKKISIKXn^QDDRRRQRDDVQNFlSQNTOAILINPVDSKAIVTAlKSANNANIPV^ 


133 






+ + + + nAQ+D+++Q DV++ I Q VDA+LINP DS AI TA++SAN +PV+ + 




Sbjct: 


65 


KRGMKVIIVDAQNDSSKQTSOTEaDLIQQGVimiLINPTDSSAISTAVESaNAVGVPVVTI 


124 


Query: 


134 


DRGSEGGKOTjTTVASDNVAAGKMAADYAVKKLGKKAPCAFELSGVPGASATVDRGKGFHSV 


193 






DR +E GKV T VASDNV G+MAA + KLGK AK EL GVPGASAT +RG GFH++ 




Sbjct: 


125 


DRSAEQGKVETLVASDNVKGGEMAAAFXADKI<3K!QAKVAEIiEGVPGASATRERGSGFHNI 


184 


Query: 


194 


AKSKIJ^ILSSQSANFDRAKAIl!m'Q^lMIQGHKDVQIIFAQ^roEMALGAaQAVK^ 


253 






A KL +++ QSA+FDR K L +N++QGH D+Q +FA NDEMALGA +A+ S+G +++ 




Sbjct: 


185 


ADQKLQVVTKQSADFDRTKGLTVMENMiQGHPDIQAVFAHNDEIffiLQALEAINSSG-K^ 


243 


Query: 


254 


LIVGIDGQPDAHDAIKKGDISATIAQQPAKMGEIAIQAAIDYYKGKKVEKETISPIYLVTK 314 






L++G DG DA +IK +SAT+AQQP +G++A +AA D GKKV+K +P+ L T+ 


Sbjct: 


244 


LVIGFDGNKDALASIKDRKLSATVAQQPELIGKLATEAADDIIflGKKVQICriSAPLKLETQ 304 



45 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 6478 (GBS203) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 52 (lane 12; MW 36.8kDa). 

GBS203-His was purified as shown in Figure 208, lane 8. 

50 Based on this analysis, it was predicted liiat this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example2093 

A DNA sequence (GBSx2208) was identified in S.agalactiae <SEQ ID 6479> which encodes the amino 
acid sequence <SEQ ID 6480>. This protein is predicted to be galactoside ABC transporter, permease, 
protein (rbsC). Analysis oif this protein sequence reveals the following: 

5 Possible site: 14 

»> Seems to have no N-terminal signal sequence 



INTEGR3VL 


Likelihood 


=-11, 


,15 


Transtnetribrane 


63 


- 79 


( 


52 


- 85) 


INTEGRAL 


Likelihood 


= -3, 


.66 


Transmenibrane 


111 


- 127 


( 


110 


- 128) 


INTEGRAL 


Likelihood 


= -2, 


.71 


Transmembrane 


168 


- 184 


( 


168 


- 188) 


INTEGRAL 


Likelihood 


= -2. 


.44 


Transmembrane 


189 


- 205 


( 


188 


- 205) 


INTEGRAL 


Likelihood 


= -0, 


.80 


Transmembrane 


17 


- 33 


( 


17 


- 33) 



Final Results 

bacterial membrane Certainty=0 . 5458 (Affirmative) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0.00Q0(Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9287> which encodes amino acid sequence <SEQ ID 9288> 
was also identified. 



20 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15612 GB:Z99122 ribose ABC transporter (permease) [Bacillus subtilis] 
Identities = 144/211 (68%) , Positives = 182/211 (86%) , Gaps = 1/211 (0%) 

Query: 1 MGMtNGLFISyGKLAPFIVTLATMTIFRGATLVYSNGNPITAGLSDSFLFQFLGQGYIVG 60 
25 +GM+NGL 1+ GK+APFI TIATMT+FRG TLVY++GNPIT GL ++ FQ G+GY +G 

Sbjct: 113 LGMINGLLITKGKMAPFIATLATMTVFRGLTLVYTDGNPIT-GLGTNYGFQMFGRGYFLG 171 

Query: 61 IPFPVILMFLTFIILYILLHKTAFGKSVYALGGNEKAAYISGIKLNKVKIIIYTISGIMA 120 
IP P I M L F+IL++LLHKr FG+ YA+GGNEKAA ISGIK+ +VK++IY+++G+++ 
30 Sbjct: 172 lPVPAIT^mAFVILWOTlLHKTPFGRRTYAIGGNEKAALISGIKVTRVKVMIYSLAGLLS 231 



Query: 121 SISGLIITSKLSSAQPTAGASYEMDAIAAWLGGTSLSGGKORIIGTLIGALIIGVLNNG 180 

+++G I+TSRL SAQPTAG SYE+DAIAAWLGGTSLSGG+GRI+GTLIG LIIG LNNG 
Sbjct: 232 ALAGAILTSRLHSAQPTAGESYELDAIAAWLGGTSLSGGRGRIVGTLIGVLIIGTLNNG 291 

35 

Query: 181 LNIIGVSAFWQQWKGIVIimVLLDRFKVA 211 

EN++GVS+F+Q WKGIVIL+AVLLDR K A 
Sbjct: 292 LNLLGVSSFYQLWKGIVILIAVLLDRKKSA 322 

40 A related GBS gene <SEQ ID 8977> and protein <SEQ ID 8978> were also identified. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useftil antigens for 
vaccines or diagnostics. 

Example 2094 

A DNA sequence (GBSx2209) was identified in S.agalactiae <SEQ ID 648 1> which encodes flie amino 
45 acid sequence <SEQ ID 6482>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.12 Transmembrane 75 - 91 ( 74 - 91) 
INTEGRAL Likelihood = -0.64 Transmembrane 96 - 112 ( 96 - 112) 



50 



55 



Final Results 

bacterial membrane — Certainty=0. 1447 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2095 

A DNA sequence (GBSx2210) was identified in S.agalactiae <SEQ ID 6483> which encodes the amino 
acid sequence <SEQ ID 6484>. This protein is predicted to be ribose transport ATP-binding protein rbsa 
(rbsA). Analysis of this protein sequence reveals the following: 

Possible site: 35 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.00 Transmembrane 401 - 417 ( 401 - 417) 

Final Results 

bacterial membrane Certainty=0 . 1001 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) <' suco 

bacterial cytoplasm — Certainty4=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CRB15611 GB:Z99122 ribose ABC transporter (ATP-binding protein) 
[Bacillus sxibtilis] 
Identities = 297/493 (60%) , Positives = 375/493 (75%) , Gaps = 1/493 (0%) 



Query: 


1 


MKIDMRNISKSFGTNKVLEKIDLELQSGQIHALMGENGAGKSTLMNILTGLFPASTGTIY 


60 






M+I+M++I K+FG N+VL + +L G++HZU:«GENGAGKSTLMNILTGL A G I 




Sbjct: 


1 


MQIEMKDIHKTFGKNQVLSGVSFQMPGEVHALMGENGAGKSTLMNILTGLHKftDKGQIS 


60 


Query: 


61 


IDGEERTFSNPQEAEEFGISFIHQEMNTWPEMTVLEMLFIfiREIKTTFGLJajQKLMRQKA 


120 






I+G E FSNP+EAE+ GI+FIHQE+N WPEMTVLENLF+G+EI + G+L + M+ A 




Sbjct: 


61 


ING^IETYFSNPKEAEQH6IAFIHQEIJ!JIWPEMTVIlE^^^FIGKEISSKLGVLQTRKMKa» 


120 


Query: 


121 


LETFKRLGVTIPLDIPIGNLSVGQQQMIEIAKSLLNQLSILVMDEPTAALTDRETENLFR 


180 






E F +L V++ LD G SVGQQQMIEIAK+L+ +++MDEPTAALT+RE LF 




Sbjct: 


121 


KEQFDKLSVSLSLDQEAGECSVGQQQMIEIAKALMTNAEVIIMDEPTAALTEREISKLFE 


180 


Query: 


181 


VIRGLKQEGVGVVYISHRMEEIFKITDFVTVMRIX3VIVDTKETSLTNSDELVKKMVGRKL 


240 






VI LK+ GV +VYISHRMEEIF I D +T+MRD6 VDT S T+ DE+VKKMVGR+L 




Sbjct: 


181 


VITALKKNGVSIVyiSHRMEEIFAICDRITIMRDQKTVDTTNISETDFDEVVKKMVGREL 


240 


Query: 


241 


EDYYPEKHSEIGPVAFEVSNL-CGDNFEDVSFYVRKGEILGFSGLMGAGRTEVMRTIFGI 


299 






+ YP++ +G FEV N +FEDVSFYVR GEI+G SGLMGRGRTE+MR +FG+ 




Sbjct: 


241 


TERYPKRTPSLGDKVPEVKNASVKGSFEDVSPyVRSGEIVGVSGLMGAGRTEMMRALFGV 


300 


Query: 


300 


DKKKSGKVKIDDQEITITTPSQAIKQGIGFLTENRKDEGLimFNIKDNMTLPSTKDFSK 


359 






D+ +G++ I ++ I P +A+K+G+GF+TENRKDEGL+LD +I++N+ LP+ PS 




Sb j ct : 


301 


DRLDTGEIWIAGiCKTAIKNPQERVKKGLGFITENRKDEGIilllTSIRHWIALPN^ 


360 


Query: 


360 


HGFFDEKTSTTFVQQLINRLYIKSGRPDLEVGNLSGGNQQKVVLAKWIGIAPKVLILDEP 


419 






G D K FV LI RL IK+ P+ +LSGGNQQKW+AKWIGI PKVLILDEP 




Sb j ct : 


361 


KGLIDHKREAEFVDLLIKRLTIKTASPETHftRHLSGGNQQKVVIAKWIGIGPICVLILDEP 


420 


Query: 


420 


TRGVDVGAKREIYQMNEIADRGVPIVMVSSDLPEILGVSDRIMVMHEGRISGELSRKEA 


479 






TRGVDVGAKREIY LMNEL +RGV I+MVSS+LPEILG+SDRI+V+HEGRISGE+ +EA 




Sbjct: 


421 


TRGVDVGAKREIYTLMNELTERGVAIIMVSSELPEILGMSDRIIVVHEGRISGEIHAREA 


480 


Query: 


480 


DQEKVMQLATGGK 492 








QE++M LaTGG+ 




Sbjct: 


481 


TQERIMTLATGGR 493 





There is also homology to SEQ ID 4678. 
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SEQ ID 6484 (GBS407d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 147 (lane 2-4; MW 72kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 147 (lane 5 & 6; MW 47kDa). 

GBS407d-His was purified as shown in Figure 235, lane 9-10. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2096 

A DNA sequence (GBSx2211) was identified in S.agalactiae <SEQ ID 6485> which encodes the amino 
acid sequence <SEQ ID 6486>. This protein is predicted to be high affinity ribose transport protein rbsd 
1 0 (rbsD). Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N- terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0. 2673 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

20 >GP:CaB15610 GB:Z99122 ribose ABC transporter (membrane protein) 

[Bacillus sxjbtilis] 
Identities = 74/131 (56%), Positives = 95/131 (72%), Gaps = 1/131 (0%) 

Query: 1 MKKTGiraSHLAKLRDDLGHTDRVCIGDLGLPVENGIPKIDLSLTSGIPSFQEVLDIYLE 60 
25 MKK GIUSrSHLAK+ DLGHTD++ I D GLPVP+G+ KIDLSL G+P+FQ+ + E 

Sbjct: 1 MKKHGIIjNSHLAKILADI/SHTDKIVIftDftGLPVPDGVLKIDLSLKTCLPAFQDTAAVLAE 60 

Query: 61 NILVEKVILAEEIKEANPDQLSRIiAKLDNSVSIEWSHNHLKQMTQDVKAVIRTGENTP 120 
+ VEKVI A EIK +N + ++ L L + lEY+SH K +T+D KAVIRTGE TP 
30 Sbjct: 61 EMAVEKVIAAAEIBCftSNQEN-AKFLENLFSEQEIEYLSHEEFKLLTBCDAKAVIRTGEFTP 119 

Query: 121 YSNIILQSGVI 131 

Y+N ILQ+GV+ 
Sbjct: 120 YANCILQAGVL 130 

35 

No correspondiog DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2097 

40 A DNA sequence (GBSx2212) was identified in S.agalactiae <SEQ ID 6487> which encodes the amino 
acid sequence <SEQ ID 6488>. This protein is predicted to be ribokinase (rbsK). Analysis of this protein 
sequence reveals the following: 



45 



50 



Possible site: 47 

>>> Seems to have an uncleavable N-term signal seq 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0.Q00Q(Not Clear) < sucg> 
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The protein has homology Avilii the following sequences in the GENPEPT database. 

>GP:CftB15609 GB:Z99122 ribokinase [Bacillus subtilis] 
Identities = 132/293 (45%) , Positives = 177/293 (60%) , Gaps = 4/293 (1%) 

5 Query: 1 MSNIVIIGSISMDLVMETNRIAKEGETVFGQRFSMVPGGKGANQAVAIGRLSQERDNITI 60 

M NI +IGS SMDLV+ +++ K GETV G F VPGGKGftNQAVA RL + + + 
Sbjct: 1 MRNICWIGSCSimLVVTSDKRPKAGETVLGTSFQTVPGGEGANQATOAARLGAQ---VFM 57 

Query: 61 IXSaiGEDSFGPILIJMJIJIIKtmVTrDFVGTIP-SSSGVAQITLYNiroNRIIYCPGA^ 119 
10 +G +G+D +G +L+NL N V TD++ + + SG A I L DN 1+ GftN + 

Sbjct: 58 VGKVGDDHYGTAIIiNNLKANGWTDYMEPVTHTESGTAHIVLAEGDNSIVVVKGANDDIT 117 

Query: 120 TKKWSQEWSIIKEADLVVLQNEIPHQANMKIANFCKEHSIKLLYNPAPSRETDIEMIjDKV 179 
I++ D+V++Q EIP + ++ +C H I ++ NPAP+R E +D 
15 Sbjct: 118 PAYAIOilAIjEQIEKVDMVLIQQEIPEETVDEVCKyCaiSHDIPIILNPAPARPLKQETIDHA 177 

Query: 180 DYFTP^IEHECQELFPNQKLEDIIATYPEKIlIVTLGTKGAIYSDGKESHLIPALETKAVDT 239 

Y TPiSJEHE LFP + + LA YP KL +T 6 +G YS G + LIP+ + VDT 
Sbjct: 178 TYLTPJffiHEASILFPELTISEAIALYPAKLFITEGKQGVRYSAGSKBVLIPSFPVEPVDT 237 

20 

Query: 240 TGAGDTFNGAFGYAISKKFKIAKALRFATLAAHLSVQKFGAQGGMPTIKEMED 292 

TGAGDTFN AF A+++ I ALRFA AA LSV FGAQGGMPT E+E+ 
Sbjct: 238 TGAGDTFNAAFAVALAEGKDIEAALRFANRAASLSVCSFGAQGGMPTRNEVEE 290 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefijl antigens for 
vaccines or diagnostics. 

Example 2098 

A DNA sequence (GBSx2213) was identified in S.agalactiae <SEQ ID 6489> which encodes the amino 
30 acid sequence <SEQ ID 6490>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0 . 2272 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9477> which encodes amino acid sequence <SEQ ID 9478> 
40 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15608 GB:Z99122 transcriptional regulator (LacI family) 
[Bacillus subtilis] 
Identities = 141/327 (43%) , Positives = 204/327 (62%) , Gaps = 4/327 (1%) 

45 

Query: 13 MSTIRQVAEKAGVSTSTVSRYISQNGYVSQKASQKIEQAIRELHYVPNFLAQSLKTKKNQ 72 

M+TI+ VA AGVS +TVSR ++ NGYV ++ ++ A+ +L+Y PN +A+SL ++++ 
Sbjct: 1 MATIKDVAGRAGVSVATVSRNUTONGYVHEETRTRVXAAMAKIJTYyPNE^ 50 

50 Query: 73 LVGUjLPDISNPFFPRLARGVEEFLKEQGYRVMLGNTtmKSHLEEEYIilWL^^ 132 

L+GLLLPDI+NPFFP+LARG E+ L +eYR++ GN++ + g EYL Q++ A6II 
Sbjct: 61 LIGLLLPDITOPFFPQLARGAEDELNREGYRLIFGNSDEELKKEDEYLQTFKQNHVAGII 120 

Query: 133 --TTHDFTKNHPEIDIPVWVDRVNQETQYGVFSDNKEGGKLAAQAIWTAGATNILLIRG 190 
55 T + + + ++ PW +DR E V SD G KLAAQRI + I L+RG 

Sbjct: 121 AATNYPDLEEYSGMNYPWFLDR-TLEGAPSVSSDGYTGVKLAAQAIIHGKSQRITLLRG 179 



Query: 191 PLDKADNMIQRFQGSQNYLIiNKGACFAIEDSASFDFAEIQIEAKTLLDHHPDIDSIIAPS 250 
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P RF G+ L F + ++ASF + Q AK L +P D +IA + 

Sbjct: 180 PA-HLPTAQDRKNGALEILKQAEVDFQVIETASFSIKDAQSMAKELFASYPATDGVIASN 238 

Query: 251 DIHAIAYLHEILNRGKRIPEDVQIIfiYDDILMSQFIYPSLSTIHQSSYIMGQKAAELIFK 310 
5 DI A A LHE L RGK +PED+QIIGYDDI S ++P LSTI Q +Y MG++AA+L+ 

Sbjct: 239 DIQAAAVLHEALRRGKNVPEDIQIIGYDDIPQSGLLFPPLSTII^PAYDMGKEAAKLLLG 298 

Query: 311 ITNQLPITNKRIKLPVHYVERETLRRK 337 
I + P+ I++PV Y+ R+T R++ 
10 Sbjct: 299 IIKKQPIiRETAIQMPVTYIGRKTTRKE 325 

A related DNA sequence was identified in S.pyogenes <SEQ YD 649 1> which encodes the amino acid 
sequence <SEQ ID 6492>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
15 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1657 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

20 bacterial outside — Certainty^O. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 232/328 (70%) , Positives = 274/328 (82%) 

25 CJuery: 10 GVSMSTIRQVAEKAGVSTSTVSRYISQNGYVSQKASQKIEQAIRELHYVENFIjAQSLKrK 69 

G +M TI+QVBE+AGVS STVSRYISQ GYVS A KI+ AI +LHy PN lAQSLKTK 
Sbjct: 14 GKA^OT'IKC3VAEEAGVSRSTVSRYISQKiGYVSDImHKIKAAlAKLHYTPNVLAQSLCT^ 73 

Query: 70 KNQLVGLLLPDISNPFFPRLARGVEEFLKEQGYRVMIX3NTNNKSHLEEEYI^^ 129 
30 KNQLVGLLLPDISNPFFPRLARG EE+LKE+GYRVMLGN ++ LEEEY++VLLQSNAA 

Sbjct: 74 KNQLVGLLLPDISNPFFPRLARGAEEYLKEKGYRVMLGNISDSEALEEEYVHVLLQSNAA 133 

Query: 130 GIITTHDFTKNHPEIDIPVVVVDRVNQETQyGVFSDNKEGGKIAAQAIWTAGATNILLIR 189 
GIITTHDFTK +P + IPWWDRV+QETQYGVFSDN+ GG LAAQ +W AGA +LLIR 
35 Sbjct: 134 GIITTHDFTKRYPTLAIP\AAA7DRVDQETQYGOTSDNRAGGLIAAQTVWQAGAKEVLLIR 193 

Query: 190 GPLDKADNtNQRFQGSQNYLIiNKGACFAlEDSASFDFAEIQIEAKTLLDHHPDIDSIIAP 249 

GPLD A+N+N+RF+ S +YL + + DS +FDF IQ+EA L +P IDSIIAP 

Sbjct: 194 GPLDNAENINERFEASFSYLQKQDVTMYVCDSQNFDFESIQLEASYNLKCYPTIDSIIAP 253 

40 

Query: 250 SDIHAIAYLHEILNRGKRIPEDVQIIGVDDILMSQFIYPSLSTIHQSSYIMGQKAAELIF 309 

SDIHAIAY+HB+ ++GK+IP+DVQIIGYDDILMSQFIYPSLSTIHQSSY+MG+ AAEL++ 
Sbjct: 254 SDIHAIAYIHELHSQGKKIPQDVQIIGYDDILMSQFIYPSLSTIHQSSYLMGRYAAELVY 313 

45 Query: 310 KITNQLPITNKRIKLPVHYVERETLRRK 337 

I +QI. + RIKLPVHYVERET+R++ 
Sbjct: 314 TIASQLTVKANRIKLPVHYVERETIRKR 341 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
50 vaccines or diagnostics. 

Example 2099 

A DNA sequence (GBSx2214) was identified in S.agalactiae <SEQ ID 6493> which encodes the amino 
acid sequence <SEQ ID 6494>. Analysis of this protein sequence reveals the following: 

Possible site: 57 
55 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-13.80 Transmembrane 27 - 43 ( 24 - 51) 

INTEGRAL Likelihood =-10.61 Transmembrane 337 - 353 ( 329 - 362) 

INTEGRAL Likelihood = -9.18 Transmembrane 257 - 273 ( 249 - 276) 

INTEGRAL Likelihood = -8.92 Transmembrane 302 - 318 ( 291 - 326) 

60 
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Final Results 

bacterial membrane -- 
bacterial outside -■ 
bacterial cytoplasm -• 



- Certainty=0 . 6519 (Affirmative) < suco 

- Certaintyi=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 8979> which encodes amino acid sequence <SEQ ID 8980> 
was also identified. Analysis of this protein sequence reveals the following: 



Crend: 6 



3.20 



Lipop Possible site: -3 
SRCFLG: 0 

McG: Length of UR: 
Peak Value of UR: 
Net Charge of CR: 1 
Di scrim Score: ( 
Signal Score (-7.5): 0. 
Possible site: 46 
»> Seems to have a cleavable N-term signal seq. 
Amino Acid Composition: calculated from 47 



McG: 
GvH: 



.06 
.0500002 



ALOM program 
INTEGRAL 
INTEGRAL 
- INTEGRAL 
PERIPHERAL 



count: 3 value: -10. 
Likelihood =-10.61 
Likelihood = -9.18 
Likelihood = -8.92 
Likelihood = 4.98 



61 threshold: 
Transmembrane 
Transmembrane 
Transmembrane 
152 



0.0 

326 - 
246 - 
291 - 



342 
262 
307 



( 318 - 348) 
( 238 - 265) 
( 280 - 315) 



modified ALOM score: 2.62 
icml HYPID: 7 CFP: 0.525 



*** Reasoning Step: 3 



Final Results 

bacterial membrane -- 

bacterial outside -- 
bacterial cytoplasm -- 



Certainty=0 . 5246 (Affirmative) < suco 

Certainty=0 . 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF12525 GB:AE001863 hypothetical protein [Deinococcus radiodurans] 
Identities = 103/352 (29%) , Positives = 191/352 (54%) , Gaps = 9/352 (2%) 



Query: 


15 


AWKELTFYKKKYLLIELLIIVMMFMWFLSGLANGLGRAVSAAIENNPAQTYILNEGAEQ 


74 






A +EL K + LLI ++ ++ FMV L+GL GL R ++ + + PAQ+++ + A+ 




Sbjot: 


4 


ALRELQHQKLRSLLIGGIVALIAFMVFMLTGLTRGLSRDSASLLLDTPAQSFVTTKEADG 


63 


Query: 


75 


VITSSVLTTKDQTDLNSLNLKDSTTLNIQRSSLTRQGHEKKIDISYFAIDKDSFMAPTLS 


134 






V+ S L+ + +++L + ++ ++ +K++ +D F+AP +S 




Sbjct: 


64 


VLNRSFLSPEQ- - -VSALQQDNEDAAAFAQTFVSFSHGDKQLSGVLLGVDPRGFLAPDVS 


120 


Query: 


135 


EGKQLTSYKKAIILNDSLKAEGIKLGDKVIDKSSSISLTWGFVHNSMYGHGPVAFIDKD 


194 






Ea+ li A++ ++SL+ +G+K+GD + K S L V GF ++ H P ++ 




Sbjct: 


121 


EGQTLRVAGGAW-DESLREDGVKVGDVLTLKPSGDQLRVSGFTRSARLNHQPGMYVSLA 


179 


Query: 


195 


lYTEINKKINPQYQFLPQALVMKNDKSISHLP-TQLEAVSKKDVIQHIPGYSAEQSTLNM 


253 






+ +K+NP+ A+ + + +L L ++ +Q +PGy EQ +L M 




Sb j ct : 


180 


RW QKLNPRMHGTVNAVALPAAPAQVNLGGADLSVTNRAQTLQVLPGYKEEQGSLTM 


235 


Query: 


254 


ILWVLWASAGILGVFFYI XTLQKRHEFSVMKAIGTKMSEIALFQLSQVI ILALFGI IVG 


313 






I L+ +A +L FFY++TLQK +F ++KAIG +A ++Q++IL L + + 




Sbjct: 


236 


IQVFLIAVAAFVLATFFYVMTLQKTAQFGLLKAIGASNRTLAGSVVAQMLILTLLAVAIA 


295 


Query: 


314 


DGLAVALSYVLPAQMPFVINWQNIILVSFVFLVIAMISSALSIVKVAKIDPV 365 








+ + + +LPA MPF + NI S + LV+A ++S LS+ +VAK+DP+ 




Sb j ct : 


296 


AAVTLGMVQLLPAGMPFHLTAANIASASGLLLWAALASLLSVRRVAKVDPL 347 





A related DNA sequence was identified in S.pyogenes <SEQ ID 6495> which encodes the amino acid 
sequence <SEQ ID 6496>. Analysis of this protein sequence reveals the following: 



Possible site: 58 
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»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-12.31 Transmembrane 246 - 262 ( 233 - 270) 
INTEGRAL Likelihood = -8.49 Transmembrane 327 - 343 ( 321 - 351) 
INTEGRAL Likelihood = -1.01 Transmembrane 301 - 317 ( 301 -'317) 



Pinal Results 

bacterial membrane Certainty=0 . 5925 (Affirmative) < suco 

bacterial outside — Certainty=0.0000 (Not Clear) < suco 
bacterial cytoplasm' — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF12525 GB:AE001863 hypothetical protein [Deinococcus radiodurans] 
Identities = 101/360 (28%) , Positives = 175/360 (48%) , Gaps = 11/360 (3%) 

15 Query: 1 MFLALNEMKQSKLRYGLIAGLLCLVAYLMFFLSGLAFGLMQEHRSAVDLWKADSVLLAKD 60 

M+LAL E++ KIiR LI G++ L+A+++F L+GL GL +++ S + A S + K+ 
Sbjct: 1 MyLALRELQHQKLRSLLIGGIVALIAFM\miLTGLTRGLSRDSASLLIiDTPAQSFVTTKE 60 

Query: 61 ADATLTLSQVSRAQENQITADKyAPLAQLNTVAWSVKNPKnADKVKVSLPGIDSNSFIRP 120 

20 ADLS+SQ + +D AT KVL G+D F+ P 

Sbjct: 61 ADGVLNRSFLSPEQVSALQQDNEDAAAFAQTFVSFSHGDKQLSGV---LLGVDPRGFLAP 117 

Query: 121 NIVKGRLFKraKEVVLDQSLAKEEAFAIGKDFYTSSSSQALTIVGYTQNARFSVAPVV™ 180 
++ +G+ + V+D+SL +E+ +G S L + G+T++AR + P +Y+ 

25 Sbjct: 118 DVSEGQTLRVAGGAVVDESL-REDGVKVGDVLTLKPSGDQLRVSGFTRSARIOTQPGMYV 176 

Query: 181 NLEAFETLKyGEPLPKDKQVVNAFITKGS--LTDyPKKDFQKLDIKrFITKLP6YSAQLL 238 

+L ++ L P+ VNA + + D + + LPGY + 

Sbjct: 177 SLARWQKLN PRMHGTVNAVALPAAPAQVNLGGADLSVTNRAQTLQVLPGYKEEQG 231 

30 

■ Query: 239 TFGFMISFLVIISAIIIGIFMYILTIQKAPIFGIMKAQGISNKTITTAVLMQTFFLSFLG 298 
+ + FL+ ++A ++ F Y++T+QK FG++KA G SN+T+ +V+ Q L+ L 
Sbjct: 232 SLTMIQVTliIAVAAFVIATFFYVm'LQKTAQFGLLKAIGaiSmTLAGSVVAQMLILTLIA 291 

35 Query: 299 SGLGLLGTWLTSLLLPTWPFQSNWFLYLAIFVSMICFALLGTLFSVFNIIRIDPLKAIG 358 

+ T LLP +PF + ++ A L +L SV + ++DPL A+G 

Sbjct: 292 VAIAAAVTLGMVQLLPAGMPFHLTAANIASASGLLLWAALASLLSVRRVAKVDPLIALG 351 

An alignment of the GAS and GBS proteins is shown below. 

40 Identities = 96/356 (26%) , Positives = 178/356 (49%) , Gaps = 4/356 (1%) 

Query: 15 AWKELTFYKKKYLLIELLIIVMMFMWFLSGLANGLGRAVSAAIENNPAQTYILNEGAEQ 74 

A E+ K +Y LI L+ ++ +++ FLSGLA GL + +A++ A + +L + A+ 
Sbjct: 4 ALNEMKQSKLRYGLIAGLLCLVAYLMFPLSGIAFGLMQENRSAVDLWKADSVIiAKn^ 63 

Query: 75 VITSSVLTTKDQTDnNSLNLKDSTTLNIQRSSLTRQGHEKKIDISYFAIDKDSEMAPTLS 134 

+T S ++ + + + + LN S+ K+ +8 F ID +SF+ P + 

Sbjct: 64 TLTLSQVSRAQENQITADKVAPLAQIOTVAWSVKNPKDADKVKVSLEV3IDSNSFIRPNIV 123 

50 Query: 135 EGKQLTSYKKAIILNDSLKAEGIKLGDKVIDKSSSISLTWGFVHNSMYGHGPVAFIDKD 194 

+G+ + K+ ++ K E +G SSS +LT+VG+ N+ + PV +++ + 

Sbjct: 124 KGRLFKTNKEVVLDQSLAKEEAFAIGKDFYTSSSSQALTIVGYTQNARFSVAPWYMNLE 183 

Query: 195 lYTEIN-KKINPQYQPLPQALVMKNDKSISHLPTQ-LEAVSKKDVIQHIPGYSAEQSTIiN 252 
55 +++P+ + +A + K S++ P + + + K I +PGYSA+ T 

Sbjct: 184 AFETLKYGEPLPKDKQVVNAFITRG--SLTDYPKKDFQKLDIICrFITKLPGYSAQLLTFG 241 

Query: 253 MILWVLWASAGILGVFFYIITLQKRHEFSVMKAIGTKMSEIALFQLSQVIILALFGIIV 312 
++ LV+ SA I+G+F YI+T+QK F +MKA G I L Q L+ G + 

60 Sbjct: 242 FMISPLVIISAIIIGIFMYILTIQKAPIFGIMKAQGISNKTITTAVLMQTFFLSFLGSGL 301 

Query: 313 GDGLAVALSYVLPAQMPFVINWQNIILVSFVFLVIAMISSALSIVKVAKIDPVEVT 368 

G S +LP +PF NW + + + A++ + S+ + +IDP++ I 

Sbjct: 302 GLLGTWLTSLLLPTWPFQSNWFLYLAIFVSMICFALLGTLFSVENIIRIDPLKAI 357 

65 
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SEQ ID 8980 (GBS239) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 175 (lane 13; MW 64kDa). 

GBS239-GST was purified as shown in Figure 227, lane 4. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 2100 

A DNA sequence (GBSx2215) was identified in S.agalactiae <SEQ ID 6497> which encodes the amino 
acid sequence <SEQ ID 6498>. This protein is predicted to be heterocyst maturation protein (devA) 
(b0879). Analysis of tliis protein sequence reveals the following: 

10 Possible site: 33 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1751 (Affirmative) < suco 

15 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>6P:CAA05977 GB:AJ003195 ATP-binding subunit [Anabaena variabilis] 
20 Identities = 87/225 (38%) , Positives = 146/225 (64%) , Gaps = 1/225 (0%) 



25 



Query: 3 AILELKHISKHYPDGDELLSILDNLDLSVSAGEFVAILGPSGSSKSTLLSIASLLLGaDQ 62 

A++ +K ++ +Y G IL +++L + GE V + GPSGSGK+TLLS+ G L + 

Sbjct: 5 AVmKS]aiIHYYGKGALKRQILFDINLEIYPGEIVIMTGPSGSQICrTLI.SLIGGLRSVQE 64 

Query: 63 GSLYVNHENVTDLSQRQRTQLRREALGFIFQSHQLLPYLTIQEQLQQEARFAICHYDKKrS 122 

G+L ++ SQ + Q+RR ++G+IFQ+H LL +LT ++ +Q +H ++ + 

Sbjct: 65 GNLQFIfiVELSGASQNKLVQIiai-SIGYIFQAHBILLGFLTARQNVQNIAVELNEHISQEEA 123 

30 Query: 123 LEEINKLLSDLGIEQCAHKYENQLSGGQKQRAAIARAFINHPKVILftDEPTASLDEERGR 182 

+ + +L +G+E YP+ LSGGQKQR AIRRA +N+P ++LM3EPTA+LD++ GR 

Sbjct: 124 lAKAEAMLKAVGLENRVDYYPDNLSGGQKQRVAIARALVNNPPLVLADEPTAALDKQSGR 183 

Query: 183 QVTELIRQEVKSHNTAAIMVTHDERVLDLVDTVYRLKDGKLVKEN 227 
35 V E++++ K T+ ++VTHD R+I1D+ D + ++DG L +++ 

Sbjct: 184 nWEIMQRLAKDQGTSILLVTHDNRILDIADRIVEMEDGILARDS 228 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6499> which encodes the amino acid 
sequence <SEQ ID 6500>. Analysis of this protein sequence reveals the following: 

40 Possible site: 13 

»> Seems to have no N-teiminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4181 (Affirmative) < suco 

45 bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty4=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 103/224 (45%) , Positives = 149/224 (65%) , Gaps = 4/224 (1%) 



50 



Query: 3 AILELKHISKHYPDGDELLSILDNLDLSVSAGEFVAILGPSGSGKSTLLSIAGLLLGADQ 62 

++L K ++K + DG ++ L D S+ AGEFVAI+GPSGSGKST L+IAG L 
Sbjct: 3 SVLTFKQVTKTFQDGHHEINALKATDFSIEAGEFVailGPSGSGKSTFLTIAGGLQTPSS 62 



55 Query: 63 GSLYVNHENVTDLSQRQRTQIjaiEALGFIFQSHQLLPYLTIQEQI«3EARPAK3m)KKTS 122 
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G h -++• + T LS+++R++IjR +++GFI Q+ L+P+ T+Q+QL+ H 
Sbjct: 63 GQLIIDGTDYTHLSEKERSRLRFKSVGFILQASNLIPFSTVQQQLE LVDHLTGSKE 118 

Query: 123 LEEINKLLSDLGIEQCAHKYPNQLSGGQKQRAAIARAFINHPKVILADEPTASLDEERGR 182 
5 + N+L DLGI H+ P +LSGG++QRAAIARA + P +ILftDEPTASIiD E+ 

Sbjct: 119 KaKaNQLFDDLGITGLKHQLPQELSGGERQRAAIJ!JUUiYHDPALliaDEPTASLDTEKft.Y 178 

Query: 183 QOTELIRQEVKSHNTAAIMVTHDERVLDLVDTVYRr.KDGKLVKE 226 
+V +L+ +E K N A IMVTHD+R+L D VYR++DG+L +E 
10 Sbjct: 179 E\A?KtiIAKESKEKN10iIIiyWTHDDRMI)KSCDKVyRMQDGELCQE 222 

Based on this analysis, it was predicted that this protein and its epitopes, could be useftd antigens for 
vaccines or diagnostics. 

Example 2101 

15 A DNA sequence (GBSx2216) was identified in S.agalactiae <SEQ ID 6501> which encodes the amino 
acid sequence <SEQ ID 6502>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

»> Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0. 2645 (Affirmative) < suco 

bacterial membrane Certaintyi=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB64972 GB:AJ012050 VicR protein tEnterococcus faecalis] 
Identities = 86/229 (37%) , Positives = 132/229 (57%) , Gaps = 10/229 (4%) 

KILVVEnNIVQQKIITTKLTQEGyQFITASNGQEaiiNCLDTEEVQLIITDimPMMDGYQ 62 
30 KILW+D +1+ L +EGy+ TA +6+EM1 ++ E LII D+M+P MDG + 



35 



40 



Query: 


3 


Sbjct: 


52 


Query: 


63 


Sb j ct : 


112 


Query: 


120 


Sb j ct : 


171 


Query: 


176 


Sbjct: 


231 



LIQELRSAAYNVPIIVMTAKSQMEDMTKGFGLGADDYMVKPVQLQELALRIKALLRR 119 

+ +E+R +++PII++TAK D G LGADDY+ KP +EL R+KA LRR 



A + Q +L IG+ ++ D + ++I +EF +L++L + ++ TR L 



LDSIWGMDTDLDERWDACINKIRRKVEHLPDFK- - lETVRGVGYRAKN 222 
L ++WG D D R VD + ++R K+E P + T RGVGY +N 



45 There is also homology to SEQ ID 1 1 82. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 2102 

A DNA sequence (GBSx2217) was identified in S.agalactiae <SEQ ID 6503> which encodes the amino 
50 acid sequence <SEQ ID 6504>. This protein is predicted to be sensor protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 38 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -8.97 Transmembrane 53 - 69 ( 47 - 77) 

55 
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Final Results 

bacterial membrane Certainty=0 .4588 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC62214 GB:AF049873 sensor protein [Lactococcus lactis] 
Identities = 97/307 (31%) , Positives = 169/307 (54%) , Gaps = 16/307 (5%) 

10 Query: 57 SAIAWFLSLVIASISl^mGSYHLTKPILDISHIVSNVaDGDFEGHIYRNSNRRKSYEYY 116 
+ LR.V4- +L++ + S++Y + 4-T+P+L I +A GD + N+ 
Sbjct: 170 AVLAVI--TLIVTAFSIFYITRTVTRPLLKIKLGTDKIAC3GDLSIQIJJVNTE 219 

Query: 117 NELDELSESINQMIVSLSHMDHMRKDFITNVSHELKTPIAAVANIVELLQDPELDEETQS 176 
15 +EL EL++S1 + L M R +F+++V+HEI1+TP+ + ++ E ++ 

Sbjct: 220 DELGELAKSIEDLAEKLDFMKRERNEFLSSVAHEIOITPLTFIKGYADIANRSTTSLEDKT 279 

Query: 177 ELLGLVKTESLRLTRLCDTMLQlERVDNQETIGEl^STOVDEQIRQaMISLTERWQftKRI 236 
+ L +++ ES LT+L + ++ +++++ E V + EI + +++ + KRI 

20 Sbjct: 280 QYLRIIREESRHLTQimDLMNLAQLEENGPKTOKHQVLIQELINEWSKVSGVFSEK^^ 339 

Query: 237 NFQLDSKPYTVYSNSDLLM--QVWINLIiDNalKySEDIVDLSVRMEETNNHYLRVIISDK 294 

NF L S Y+N D + QV +NLL NA KYS D D+ + ++ +++ISDK 

Sbjct: 340 NF-LISGEGNFYANIDFMRIEQVLVNLIMSIAYKYSADESDIKLAFIPEKENF-KIVISDK 397 

25 

Query: 295 GRGISQY0VQHIFDKFYQaDQSHNQQ--GNGIiGLAIVKRIIVLCKGRISVSSQLEIGTEF 352 

G GI + D+ +1F++PY+ D+S + G GLGLAIV+ 1+ G+I V S GT F 
Sbjct: 398 GEGIPEQDLPYIFERFYRVDKSRTRTTGGVGIX3LAIVQDIVKKHNGKIIVESIQNQGTTF 457 

30 Query: 353 CVELPLS 359 

+E1,P S 
Sbjct: 458 IIEIiPYS 464 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

A related GBS gene <SEQ DD 8981> and protein <SEQ ID 8982> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: 4.84 
40 GvH: Signal Score (-7.5): 0.179999 

Possible site: 35 
>>> Seems to have a cleavable N-term signal seq. 
ALOM program count: 1 value: -8.97 threshold: 0.0 

INTEGRAL Likelihood = -8.97 Transmembrane 50 - 66 ( 47 - 77) 
45 PERIPHERAL Likelihood = 1.27 324 

modified ALOM score: 2.29 

*** Reasoning Step: 3 

50 Final Results 

bacterial membrane Certainty=0 . 4588 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the databases: 

31.9/57.3% over 293aa 

Lactococcus lactis 

GP| 3687664 1 sensor protein Insert characterized 

60 ORF01881(478 - 1377 of 1677) 

GP| 3687664 |gb|AAC62214.l| |AF049873 (171 - 464 of 464) sensor protein {Lactococcus lactis} 
%Match =12.9 
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%Identity = 31.9 %Similarity = 57.3 

Matches = 94 Mismatches = 121 Conservative Sub.s = 75 

339 369 399 429 459 489 519 549 

5 MTKLRRFRFPLRFYFTLMFVLTMLFSVIiaSLIiLVAAIVFTFFQGVLTTHVLQVSJUa.VV^ 

I :: : : : | :: : :: : : ::|:: : |::| : :|:| 

EKKNKKESIiHFHWIXSDKYIVSKSRIQSNGKIVGSVYMFLSTRPIQKMVENFTGIPAVLAVITLIOTAFSIFYITRT^ 
130 140 150 160 170 180 190 

10 579 609 639 669 699 729 759 789 

ILDISHIVSWADGDFEGHIYRNSNRRKSYKmffiLDELSESINQMIVSLSHMDHMRKDFITIWSHELKTPIAAVANIVE 
:| I :| II: :: |: :|| ||::|| : | | | : | : : : | : | | 1 : | | = : = 

LLKIKLGTDKIAQGDLSIQUraSITE DELGELAKSIEDLAEKLDFMKRERNEFLSSVaHELRTPLTFIKGYAD 

210 ' 220 230 240 250 260 



15 



819 849 879 909 939 969 999 1029 

LLQDPELDEETQSELLGLVKTESLRLTRLCDTMLQMSRVDNQETIGELSSVRVDEQIRQAMISLTERWQAKRINFQLDSK 



liUSIRSTTSLEDKTQYLRIIREESRHLTQLMEDIJIin^LEENGFKVEKHQVLIQELINEWS 
20 280 290 300 310 320 330 340 

1059 1083 1113 1143 1173 1203 1233 

PYT\nrSNSDLI.--MQWINLLDNAIK^SEDIVDLSVRMEETNNHYLRVIISDKGRGISQYDVQHIFDKFYQADQSHNQQ- 

1 = 1 !:: II =111 II III I h = :: = lllll II = 1= Hl-lh hi = 

25 EGNFYANIDFMRIEQVLVNIiMNAYKYSADESDIKIAFIPEKENF-KIVISDKGEGIPEQDLPYIFERFYRVDKSRTRTO 
360 370 380 390 400 410 420 

1287 1317 1347 1377 1407 1437 1467 1497 

-GNGLGIAIVKRIIVLCKBRISVSSQLEIGTEFCVELPLS*LFKTITA1TOQI^FYLFRNKYTKNRQKL* 

30 Mllllll: |: Mil III :|ll I , 

GGVGLGIAIVQDIVKKHNGKIIVESIQNQGTTFIIELPYS 
440 450 460 

SEQ ID 8982 (GBS170d) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 181 (lane 4; MW 35kDa) and in Figure 123 (lane 5-7; MW 35kDa). It was also 
35 expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 
123 (lane 2-4; MW 60kDa) and in Figure 184 (lane 3; MW 60kDa). Purified GBS170d-GST is shown in 
Figure 243, lane 7; purified GBS170d-His is shown in Figure 234, lanes 5-6. 

Example 2103 

A DNA sequence (GBSx2218) was identified in S.agalactiae <SEQ ID 6505> which encodes the amino 
40 acid sequence <SEQ ID 6506>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>>> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0. 0502 (Affirmative) < suco 

bacterial membrane Certaintyi=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

50 >GP:BRB06906 GB:AP001518 argininosuccinate synthase 

(citrulline-asparate ligase) [Bacillus halodurans] 
Identities = 262/396 (66%) , Positives = 321/396 (80%) , Gaps = 1/396 (0%) 

Query: 1 MGKEKLILAYSGGLDTSVAIAWLK-KDYDVIAVCMDVGEGKDLDFIHDKALTIGAIESYI 59 
55 M K+K++LAYSGGLDTSVAI WL K YDVIAV +DVGEGKDL+F+ +KAL +GAIESY 

Sbjct: 1 MSKKKVVIAYSGGII)TSVAIKWLSDK6YDVIAVGIiDVGEGKDLEFVKEKftLKVGAIESYT 60 

Query: 60 LDVKDEFAEHFVLPALQAHAMYEQKYPLVSMiSRPIIAQKLVEMAHQTGATTIAHGCTGK 119 
+D K EFAE FVLPALQAHA+YEQKYPLVSALSRP+I++KLVE+A QTGA +AHGCTGK 
60 Sbjct: 61 IDAKKEFAEEFVLPALQAHRLYEQKYPLVSALSRPLISKKLVEIAEQTGAQAVaHGCTGK 120 
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Query: 


120 


GNDQWFEVAIAALDPELKVIAPVREWKWHREEEITFAKANGVPIPADLDNPYSIDQNLW 


179 






GNDQVRFEV+I AL+P L+V+APVREW W R+EEI +AK N +PIP DLDNPYS+DQNLW 




Sbjct: 


121 


GlSTOQVRFEVSIQWaiENLEVLaPVREmWSRDEEIEYAKKmiPIPIDLDNPYSVDQm 


180 


Query: 


180 


GRJaroCGVLENPWNQAPEEAFGITKSPEERPDCMYIDITFQNGKPIAINNQEMTLRDLI 


239 






GR+NECG+LE+PW PE A+ +T + E+APD E ++I F+ G P+ +N + + +LI 




Sb j Ct : 


181 


GRSNECGILEDPWATPPEGAYELTVAIEDAPDQPEIVEIGFEKGIPVTIiNGKSYPVHELI 


240 


Query: 


240 


LSLNEIAGKHGIGRIDHVENRLVGIKSREIYECPAAMVLLAAHKEIEDLTLVREVSHFKP 


299 






L m+IAGKHG+GRIDHVENRLVGIKSRE+YECP AM L+ AHKE+EDLTL +EV+HPKP 




Sb j ct : 


241 


LEUJQIAGKHGVGRIDHVEmiLVGIKSREVYECPGaMTLIKAHKELEDLTLTKEVaHFKP 


300 


Query: 


300 


ILENELSNLIYIiaLWFSPATKAIIAYVKETQKVWGTTKVKLYKGSAQVVaRHSSNSLYD 


359 






++E +++ LIY LWFSP A+ A++KETQ V G +VKL+RG A V R S SLY+ 




Sbjct: 


301 


VVEKKIAELIYEGLWFSPI^PALSAFLKETQSTVTGVVRVKLFKGHAIVEGRKSEYSLYN 


360 


Query: 


360 


ENLATYTAADSFDQDAAVGFIKLWGLPTQVNAQVNK 395 








E lATYT D FD +AAVGFI LWGLPT+V + VNK 




Sbjct: 


361 


EKLATYTPDDEFDHNAAVGFISLWGLPTKVYSMVNK 396 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

Example 2104 

A DNA sequence (GBSx2219) was identified in S.agalactiae <SEQ ID 6507> which encodes the amino 
acid sequence <SEQ ID 6508>. This protein is predicted to be argininosuccinate lyase (argH). Analysis of 
Hiis protein sequence reveals the following: 

Possible site: 43 

»> Seems to have no N-tertninal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2131 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06905 GB:AP001518 argininosuccinate lyase [Bacillus halodurans] 
Identities = 284/454 (62%) , Positives = 350/454 (76%) 



Query: 


6 


KliWGGRFESSLEKWVEEFGASISFDQKLAPYDMKASMaHVTMLGKrDIISQEEAGLIKDG 


65 






KLWGGRF + E WV+EFGASI FDQ+L D++ S+AHVTML K+ I++ EE IK G 




Sbjct: 


3 


KLWGGRFTKTAEAWVDEFGASIGFDQQLVEEDIEGSLAHVTMLEKSGILANEEVEQIKRG 


62 


Query: 


66 


LKILQDKYRAGQLTFSISNEDIHMNIESLLTAEIGEVAGKLHTARSRNDQVATDMHLYLK 


125 






L IL +K + G+L +S++NEDIH+NIE LL EIG V GKLHT RSRNDQVATDMHLYL+ 




Sbjct: 


63 


LHILLEKAKKGELSnfSVANEDIHnNIEKLLIDEIGPWMKIiHTGRSRiroQVATDm^ 


122 


Query: 


126 


DKLQE^lMKKLLHLRTTLVNIlAElraIYTVMPGYTHI<3HAQPISF6HHLMAYY™ 


185 






+ +E+++ + +++ LV A+ H+ T+H-PGYTHLQ AQPISF HHL+AY+ M RD R 




Sbjct: 


123 


KQTKEILQLVKNVQAALVEQAKQHVETLIPGYTHIORAQPISFAHHLLAYFWMLERDY 


182 


Query: 


186 


IiEFJSMKHTNLSPLGAAALAGTTFPIDRHMTTRLLDFEKPYSNSLDAVSDRDFIIEFLSNA 


245 






E ++K N+SPLGA ALAGTTFPIDR T LL F+ Y NSLDAVSDRDFI+EFLS + 




Sbjct: 


183 


YEDSLKRLNVSPLGAGALAGTTFPIDREYTAELLGFDGIYENSLDAVSDRDFIVEFLSAS 


242 


Query: 


246 


SILMMHLSRFCEEIINWCSYEYQFITLSDTFSTGSSIMPQKKNPDMAELIRGKTGRVYGN 


305 






S+LM HLSR CEE+I W S E+QF+ + D F+TGSSIMPQKKHPDMAELIRGKrGRVYG+ 




Sbjct: 


243 


SIMTHLSRLCEELILWSSQEFQFVEMDDAFATGSSIMPQKKNPDMAELIRGRTGRVYGS 


302 
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Query: 306 LFSLLTVMKSLPIA™KDLQEDKEGMFDSVETVSIAIEI^ffiNMLETMTVIffiHIM^^ 365 

LPSLLTV+K LPLAYNKD+QEDKEGMFD+V+TV ++ I A M++TM V E M + 
Sbjct: 303 LFSLLTVLKGLPIAYISffiDMQEDKEGMFDAVKTVKGSrAIFAGMIQTMKVKEETlOTKAra^ 362 

5 Query: 366 DFSNATELADYLASKGVPFRKAHEIVGICLVLECSKNGSYLQDIPLKYYQEISELIENDIY 425 

DFSNATELADYLA+KG+PFR+AHE+VGKLVL C + G YL D+PL Y+ S+L + DIY 
Sbjct: 363 DFSJmTEtJUJYIATKGMPFREftHEVVGKLVLLCIQKGIYIjLDLPLSDYKAASDLFDEDiy 422 

Query: 426 EILTARTAVKRRNSLGGTGFDQVKRQILLARKEIi 459 
10 ++L KT V RR S GGTGF +VKK I A K L 

Sbjct: 423 DVLQPKTVVARRTSAGGTGFTEVKKAIAKAEKIL 456 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

15 Example 2105 

A DNA sequence (GBSx2220) was identified in S.agalactiae <SEQ ID 6509> which encodes the amino 
acid sequence <SEQ ID 6510>. This protein is predicted to be class-II aldolase (fba). Analysis of this 
protein sequence reveals the following: 

Possible site: 42 
20 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2930 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside — Certaintyi=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9289> which encodes amino acid sequence <SEQ ID 9290> 
was also identified. Analysis of this sequence reveals: 

30 GvH: Signal Score (-7.5): -2.92 

Possible site: 42 
>» Seems to have no N-terminal signal seq. 
ALOM program count: 0 value: 0.37 threshold: 0.0 
PERIPHERAL Likelihood = 0.37 66 
35 modified ALOM score: -0.57 

*** Reasoning Step: 3 

Final Results 



40 

bacterial cytoplasm Certainty=0 . 2930 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB16889 GB:AB050113 class-II aldolase [Streptococcus bovis] 
Identities = 221/242 (91%) , Positives = 234/242 (96%) 

Query: 1 MAIVSAEKFVQAARDNGYAVGGFimiNLEm'QAILRAAEAKKAPVLIQTSMGaAKYMGGY 60 

5 0 MAI VSAEKF+ +AAR+NGYAVGGFNTIilNLEWTQAI LRAAEAKKAP+ LI QTSMGAAKYMGGY 

Sbjct: 1 MAIVSAEKFIKAARENGYAVGGFimTOIiEWTQAILRAAEAKKAPILIQTSMGAAKYMGGY 60 

Query: 61 KLCKQLIETLVESMGITVPVAIHLDHGHYDDALECIEVGYTSIMFDGSHLPVEENLEKAR 120 
KLCK LIE LVESMGITVPVAIHLDHGH++DALECIEVGYTS+MFDGSHLPVEENLEKA+ 
55 Sbjct: 61 KLCKTLIENLVESMGITVPVAIHLDHGHFEDALECIEVGYTSVMFDGSHLPVEENLEKAK 120 

Query: 121 EWAKAHAKGISVEAEVGTIGGEEDGIVGKGELAPIEDAKflMVETGIDFLAAGIGNIHGP 180 

EVWAKAHAKB+SVEAEVGTIGGEEDGIVG GELAPIEDAKAMV TGIDFLAAGIGNIHGP 
Sbjct: 121 EWAKAHAKGVSVEAEVGTIGGEEDGIVGGGELAPIEDAKAMVATGIDFLAAGIGNIHGP 180 



wo 02/34771 



PCT/GBOl/04789 



-2377- 

Query: 181 YPAOTffiGLDLDHLKKI,TEAVPGFPIVIiHGGSGIPDDQIQEAIKLGVAKVNWrECQLAFC 240 

YPANW+GL LDHLKKLT AVPGFPIVLHGGSGIPDDQI+ AIKLGVAKVNVNTECQ+AF . 
Sbjct: 181 YPANWQGLHLDHLKKLTAAVPGFPIVLHGGSGIPDDQIKAAimroiUWN^^ 240 

5 

Query: 241 QA 242 
+A 

Sbjct: 241 KA 242 

10 A related DNA sequence was identified in S.pyogenes <SEQ ID 651 1> which encodes the amino acid 
sequence <SEQ ID 6512>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

»> Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0 .2930 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

20 An alignment of the GAS and GBS proteins is shown below. 

Identities = 217/242 (89%) , Positives = 228/242 (93%) 

Query: 1 MAIVSAEKFVQAARmGYAVGGEimilNLEVWQAILRAAEAKKAPVLIQTSMGRAK™ 60 
miVSAEKFVQaaR+NGYAVGGFNllSlNLEVmSAILiyiftEAK+APVLIQTSMGa^^ 
25 Sbjct: 1 MAIVSftEKFVQaftRENSYAVGGEOTTraLEVTOAILRAAEAKQAPVLIQTSMGZUU™ 60 

Query: 61 KLCKQLIETLVESMGITVPVAIHLDHGHYDDALECIEVGYTSIMFDGSHLPVEENLEKAR 120 

K+C+ LI LVESMGITVPVAIHLDHGHY+DALECIEVGYTSIMFDGSHLPVEENL K 
Sbjct: 61 KVCQSLITNLVESMGITVPVAIHLDHGHYEDALECIEVGYTSIMFIXSSHLPVEENLaK^ 120 

30 

Query: 121 EWAKaHAKBISVEAEVGTIGGEEDGIVGKGELAPIEDAKAMVETGIDFLAaGIGNIHGP 180 

EW AHAKB+SVEAEVGTIGGEEDGI+GKjGELAPIEDAKAMVETGIDFLaftGIGNIHGP 
Sbjct: 121 EWKIAHAKGVSVEAEVGTIGGEEDGIIGKGELAPIEDAKaMVETGIDFLAAGlGNIHGP 180 

35 Query: 181 YPANWEGLDLDHLKKLTEAVPGFPIVLHGGSGIPDDQIQEAIKLGVAKVNVNTECQLAFC 240 

YP NWEGL LDHL+KLT AVPGFPIVLHGGSGIPDDQI+EAI+LGVAKVNVNTE Q+AF 
Sbjct: 181 YPENWEGLALDHLEKLTAAVPGFPIVmGGSGIPDDQIKEAIRLGVAKMJVNTESQIAFS 240 

Query: 241 QA 242 
40 A 

Sbjct: 241 NA 242 

SEQ ID 9290 (GBS683) was expressed in E.coli as a GST-fiision product. SDS-PAGE analysis of total cell 
extract is shown in Figure 150 (lane 8 & 10; MW 55kDa). It was also expressed in E.coli as a His-ftision 
45 product. SDS-PAGE analysis of total cell extract is shown in Figure 150 (lane 11-13; MW 30kDa) and in 
Figure 1 84 Gane 1 1 ; MW 30kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useM antigens for 
vaccines or diagnostics. 

Example 2106 

50 A DNA sequence (GBSx2221) was identified in S.agalactiae <SEQ ID 6513> which encodes the amino 
acid sequence <SEQ ID 6514>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence 

55 Pinal Results 

bacterial cytoplasm Certainty=0.2775 (Affimative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>,GP:AAA88585 GB:M18954 unknown protein [Streptococcus imitans] 
5 Identities = 109/229 (47%) , Positives = 156/229 (67%) , Gaps = 1/229 (0%) 



Query: 1 MFSGKRLKKRRITLGYSQSELADKLHINRSSYFNWENEKTKPNQSNLKQLAILLDVPETY 60 

MFS ++LK+RR LG SQ++ ADKL I+R SYFNWE KTKPNQ NL +LA LL V Y 
Sbjct: 1 MFSSQKLKERRKKLGLSQAQTADKLGISRPSYETniffilGKTKPNQKNriDKL^ 60 

10 

Query: 61 FESEYKIVlTIYLQLSLQNQEKVEKYAEELLQTQKVHEKIVPLFAVEVLSEIQLSftGPGEG 120 
F S++ IV Y N+ K KY++ LI.+ Q ++ +LSAG G 

, Sbjct: 61 FLSQHDIVEIYTRIJIffiSim'KTLKySQHLLEQQDKKia^KNK^ 120 

15 Query: 121 LYDEFETETVYSEDEYTGFDIATWISGNSMEPVYKDGEVALIRSTGFDHDGAVYALNWNG 180 

+ + +TV+ ++E D A+WI G+SMEP++ +GEVaLI+ TGFD+DGA+YA++W+G 
Sbjct: 121 YFGIXSNFDTVFYDEEID-HDFASWIFGDSMEPIFLNGEVRLIKQTOFDYDGAIYAIDV^^ 179 

Query: 181 SLYIKKLYREEDGFRMVSINPDVRERFIPFEDEIRIVGKIVGHFMPVIG 229 
20 YIKK+YREE 6 R+VS+N A++F P+++ RI+G IVG+F+P+ G 

Sbjct: 180 QTYIKKVYREETGLRLVSLNKKYADKFAPYDENPRIIGLIVGNFIPLEG 228 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6515> which encodes the amino acid 
sequence <SEQ ID 6516>. Analysis of this protein sequence reveals the following: 

25 Possible site: 38 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 4340 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 84/209 (40%), Positives = 130/209 (62%), Gaps = 9/209 (4%) 

35 





Query: 


25 


LHINRSSYFNWENEKTKPNQSNLKQIAILLDVPETYFESEYKZVNTYLQIjSLQNQEKVEK 


84 








LH+N+ + NWE K EN+ +L L L +V YF+ Y+++ Y QL++ N+EKV 






Sb j ct : 


5 


LHVNKMTISNVffiRGKNIPNEKHUSIALLHLENVTSDYFDPNYRLLTPYNQLTISNKEKVIG 


64 


40 


Query: 


85 


YAEELLQTQ KVHEKIVPLFAVEVLSEIQLSAGPGEGLYDEFETETVYSEDEYTG 


138 








Y+E LL Q + +K L+A V LSAG G + + + V+ DE 






Sbjct: 


65 


YSERLLNHQIDKKSKDLIDKPSQLYAYRVYES--LSAGTGYSYFGDGNFDVVPY-DEQLE 


121 




Query: 


139 


FDIATWISGNSMEPVYKDGEVALIRSTGFDHDGAVYALNWNGSLYIKKLYREEDGFRMVS 


198 


45 






+D A+W+ G+SMEP Y +GEV LI+ FD+DGA+YA+ W+G YIKK++RE++G R+VS 






Sb j ct : 


122 


YDFASWVFGDSMEPTYUraEVVLIKQNSFDYDGAIYAVEWDGQTYIKKVFREDEGLRLVS 


181 




Query: 


199 


INPDVAERFIPFEDEIRIVGKIVGHFMPV 227 










+N +++F P+ +E RI+GKI+ +F P+ 




50 


Sb j ct : 


182 


LNKKYSDKFAPYSEEPRIIGKIIANFRPL 210 





Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2107 

55 A DNA sequence (GBSx2222) was identified in S.agalactiae <SEQ ID 6517> which encodes the amino 
acid sequence <SEQ ID 6518>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

»> Seems to have no N-texminal signal sequence 
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Final Results 



bacterial cytoplasm 
bacterial membrane 
bacterial outside 



Certainty=0. 2387 (Affirmative) < suco 
Certainty^ 0.0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2108 

A DNA sequence (GBSx2223) was identified in S.agalactiae <SEQ ID 6519> which encodes the amino 
acid sequence <SEQ ID 6520>. This protein is predicted to be UmuC MucB homolog (uvrX). Analysis of 
this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have no N-terminal signal sequence 



A related GBS nucleic acid sequence <SEQ ID 9925> which encodes amino acid sequence <SEQ ID 9926> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC98439 6B:L29324 UmuC MucB homolog [Streptococcus pneumoniae] 
Identities = 303/436 (69%) , Positives = 360/436 (82%) 

Query: 39 LHTSLCVMSRADNSAGLIIASSPMFKKVFGKGNVGRAYDLPFDVHTRKFNYYR&KISGLP 98 

L LCVMSRADNSAGLILASSPMFKKVFGK NVGR+YDLPFDV TRKF+YY AK GLP 
Sbjct: 5 LRLRLCVMSRaDNSAGLILASSPMFKKVFGKSNVGRSYDLPEDVKTRKFSXyNa^^ 64 

Query: 99 TDAKFVSFIENWAKRTFIVPPRMDLYIQKNLEIQKVFQNYADPTDILPYSIDEGFIDLTS 158 

T +V +IE WAK T IVP L I N+EIQK+FQ++A P DI PYSIDEGFIDLTS 
Sbjct: 65 TTIDYVRYIEEWAKSTVIVPREWILTIAVNMEIQKIFQDFAAPDDIYPYSIDEGFIDLTS 124 

Query: 159 SlaNYFVEDKSLSRKDKLDWSAKIQHDIWEKTGVYSTVGMSNANPLLAKLALDNF^^ 218 

SLNYFV DKS+SRKDKLD++SA IQ IW KTG+YSTVGMSNANPLLAKLALDNEAK T 
Sbjct: 125 SlilKFVPDKSISRKDKLDIISAAIQKKimKTGIYSTVGMSNftNPLIAKliALDNBA^ 184 

Query: 219 TMRR^mSYEDVETKVV™IPK^ra)PWGIGSRTEKRllIKLGIYSIKELRNCDPT^ 278 

TMRANWSYEDVE KVW IPKMTDFWGIG+R EKRL+ LGI+SIKELA +P ++KKE G+ 
Sbjct: 185 TMRANWSYEDVEKKVWTIPKMTDFWGIQNRMEKRLHNLGIFSIKELAQftNPDLIKK^ 244 

Query: 279 IGVQHWFHANGIDESNVHEPYRPKAVGIGNSQVLHKDYTRQSDIELVLREMAEQVAIRLR 338 

+G++ WFHANGIDESNVH+PY+PK+ GIGNSQVL KDY +Q DIE++LREMaEQVA+RLR 
Sbjct: 245 MGLELWFHANGIDESNVHKPYKPKSKGIGNSQVLPKDYIKQRDIEIILREMaEQVAVRLR 304 

Query: 339 RRHKKATNAMNVGYSNFENKKSINVQRKINPNNRTLVFQDEWSLFRSKYDGGA 398 

R KKATW+I++GYS E K+SIN Q KI P N+T + + V+ LF +KY GA+R++A 
Sbjct: 305 RSGKKATWSIHLGYSKVEQKRSINTQMKIEPTNQTALLTNYVLKLFHTKYTSGAIRNVA 364 

Query: 399 VRYDGLVDENFAVISLFDDFEESEKEEKLETTIDSIRDRFGFLAVQKASSIiLENSRAlSR 458 

V Y GLVDE+F +ISLFDD E+ EKEE+L++ ID+IR F6F ++ K ++L + SR I+R 
Sbjct: 365 VNYSGLVDESFGLISLFDDIEKIEKEERLQSAIDAIRTEFGFTSLLKGNRLDQASRTIAR 424 



Pinal Results 



bacterial cytoplasm — Certainty4=0. 2195 (Affirmative) < suco 

bacterial membrane Certainty=0.0000(Not Clear) < suco 

bacterial outside Certaintyi=0. 0000 (Not Clear) < suco 



Query: 459 SRLVGGHSAGGLEGLK 474 
S+L+GGHSAGGL+GLK 



wo 02/34771 



-2380- 



PCT/GBOl/04789 



Sbjct: 425 SKLIC3GHSAGGLDGLK 440 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2109 

A DNA sequence (GBSx2224) was identified in S.agalactiae <SEQ ID 6521> which encodes the amino 
acid sequence <SEQ ID 6522>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4016 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . GOOD (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, coxild be useful antigens for 
vaccines or diagnostics. 

Example 2110 

A DNA sequence (GBSx2225) was identified in S.agalactiae <SEQ ID 6523> which encodes the amino 
acid sequence <SEQ ID 6524>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>» Seems to have no N-terminal signal sequence 

— Final Results 

bacterial cytoplasm Certainty=0. 2088 (Affirmative) < suco 

bacterial membrane Certainty=0. GOOD (Not Clear) < suco 

bacterial outside Certainty=G.000G (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG13G01 GB:AF227520 unknown [Streptococcus pneumoniae] 
Identities = 68/122 (55%) , Positives = 89/122 (72%) , Gaps = 6/122 (4%) 

Query: 1 MIDRSYLPFKVRREYQDRKMAKWMGFFLSEHTAGIiDSELNKVDVTSELSISDKL^^ 60 

MIDRSYLPF+ AREYQD KM KWMGFFLSEHT+ L + NKV Y S+LS+ KLLLL+Q+ 
Sbjct: 1 MIDRSYLPFQSAREYQDTKMQKWMGFFLSEHTSALTDDANKVTYMSDLSLEICKLLDLSQV 60 

Query: 61 YSNQLNGIIAVPGQ YYSGKVDNLTFMHVSLKTKTGFVSIPIKDILSIDL--EVEYE 114 

Y+ Qm I V + Y+G + +LT + + +KT TG +++ +KDI+SI+L EV YE 

Sbjct: 61 YAGQIJmillOTKICNNQVSYTGTIPSLTKDFIIjIKTTTGHINLKIiKDIVSIELVEEVLYE 120 

Query: 115 SA 116 
SA 

Sbjct: 121 SA 122 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 2111 

A DNA sequence (GBSx2226) was identified in S.agalactiae <SEQ ID 6525> which encodes the amino 
acid sequence <SEQ ID 6526>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
5 >» Seems to have no K-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4025 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside — Certainty= 0.0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9927> which encodes amino acid sequence <SEQ ID 9928> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

. Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

Example 2112 

A DNA sequence (GBSx2227) was identified in S.agalactiae <SEQ ID 6527> which encodes the amino 
20 acid sequence <SEQ ID 6528>. This protein is predicted to be soluble transducer HtrXIII. Analysis of this 
protein sequence reveals the following: 

Possible site: 56 

»> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm — Certainty=0. 5246 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) . < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

30 The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2113 

35 A DNA sequence (GBSx2228) was identified in S.agalactiae <SEQ ID 6529> which encodes the amino 
acid sequence <SEQ ID 6530>. Analysis of this protein sequence reveals the following: 

Possible site: 60 , 
, >» Seems to have no N-teiminal signal sequence 

40 Final Results 

bacterial cytoplasm — Certaintyi=0. 5131 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 {Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2114 

A DNA sequence (GBSx2229) was identified in S.agalactiae <SEQ ID 653 1> which encodes the amino 
5 acid sequence <SEQ ID 6532>. This protein is predicted to be pX02-78. Analysis of this protein sequence 
reveals the following: 

Possible site: 43 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm — Certainty=0. 2105 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF13682 GB:AF188935 pX02-78 [Bacillus anthracis] 
Identities = 101/314 (32%) , Positives = 147/314 (46%) , Gaps = 46/314 (14%) 

Query: 27 SGQIYEHPDHDSETlIFADTOTFKWFSRDIQGDVIDFVQLVAGVSFKKALSYIiETG--GFE 84 

20 S + Y +HDS I N F W SR + G++I FVQ V SF A+ L G +E 

Sbjct: 39 SERYYRLTEHDSLIIDRKKNQFYWNSRGVNGNIIKFVQEVEDASFPGRMQRliLDGEQDYE 98 

Query: 85 EAKVIEETYQPFQYYLREEP FQQARTYLKDIRGLSNQTINSFGRQGLLAQATyQAE 140 

+A I +P+ Y E+ F +AR YL + R + Q +++ +GL+ Q Y 

25 Sbjct: 99 KASEITFVSEPYDYEHFEQKEVSRFDRAREYLIEERKIDPQVVDAIiHNKGLIKQDKYN-- 156 



30 



Query: 141 SVLVFKSFDHNGTLQAASLQGLVKNEEKYDRGYLICKIMKGSHGHVGISFDIGNPKRLIFC 200 

+VIi G + S QG+VK++ KYRG KIKS + G+ GP+LF 

Sbjct: 157 NWliFLWKDRETGAVMGGSEQGVVKSD-KYKRGAWKSIQKNSTANYGENVIMSEPRlS^ 215 

Query: 201 ESVIDmSYYQLHQKQLSDVRLISMEGLKLSVIAYQTIJUaAEEQGKIiAFIiDTVKPIRLS 260 

ES ID++SY LH+ L D LISMEGLK VI + 
Sbjct: 216 ESDIDLLSYATLHKHNLKDTHLISME6LKPQVI FN 250 

35 Query: 261 HYLOAIQETTTFPQTHSNVITMAVDNDEfiGREFYQKL SDRGFPIFQ-DLPPLQ 312 

+Y++A + + +++ VDND+A6+ F ++L +D F+ + P 

Sbjct: 251 YYMKACERIGDV PDSLSLCVDNDKAGKAFVERLIHFRYEKNDGSIVAFKPEYPQAP 306 

Query: 313 RLETKSDWNDIVKR 326 
40 E K DWND KR 

Sbjct: 307 SEEKKWDWNDECKR 320 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be' useful antigens for 
45 vaccines or diagnostics. 

Example 2115 

A DNA sequence (GBSx2230) was identified in S.agalactiae <SEQ ID 6533> which encodes the amino 
acid sequence <SEQ ID 6534>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
50 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 7013 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

55 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
5 vaccines or diagnostics. 

Example 2116 

A DNA sequence (GBSx2231) was identified in S.agalactiae <SEQ ID 6535> which encodes the amino 
acid sequence <SEQ ID 6536>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
10 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 13 10 (Affirmative) < suco 

bacterial membrane Certaintys=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 2117 

A DNA sequence (GBSx2232) was identified in S.agalactiae <SEQ ID 6537> which encodes the amino 

acid sequence <SEQ ID 653 8>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
25 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 6726 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside — Certainty= 0.0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9373> which encodes ammo acid sequence <SEQ ID 9374> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

35 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefijl antigens for 
vaccines or diagnostics. 

Example 2118 

A DNA sequence (GBSx2233) was identified in S.agalactiae <SEQ.ID 6539> which encodes the amino 
40 acid sequence <SEQ ID 6540>. This protein is predicted to be phosphoglucomutase (manB). Analysis of 
this protein sequence reveals the following: 
Possible site: 38 

»> Seems to have no N-terminal signal sequence 
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Pinal Results 

bacterial cytoplasm Certainty=0 .2147 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9355> which encodes amino acid sequence <SEQ ID 9356> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

10 >GP:CAB96418 GB:AJ243290 phosphoglucoimitase [Streptococcus thermophilus] 

Identities = 391/465 (84%) , Positives = 424/465 (91%) , Gaps = 1/465 (0%) 

Query: 1 MAQHGIKSWFEALRPTPELSFAVRHLNAYAGIMVTASHNPAPFNGYKVYGQDGGQLPPA 60 
+A HGIKSYVFE+LRPTPELSFAVRHL+ +iM3IM+TASHNPAPFIIGYKVYG+DGGQ+PPA 
15 Sbjct: 107 LAftHGIKSYVFESLRPTPELSFAVRHLHTFAGIMITASHNPAPENGYKVYGEDGGQMPPA 166 

Query: 61 DADaLTDFIRAIENPFAVELADI^ESKSSGLIQVIGErrTOIEYLREVKDVNINQDIiINNF 120 

DADALTD+IRAI+NPF V+LADL++SK+S6LI++IGE+VD EYL+EVKDVNINQDLIN + 
Sbjct: 167 DADALTDYIRAIDNPFTVKIiADIiEDSKaSGLIEIIGENVDAEYLKEVKDVNINQDLINEY 226 

20 

Query: 121 GKDMKIVYTPLHGTGEMLTRRALAQAGFESWWESQAKADPDFSTVKSPNPESQAAFAL 180 

G+DMKIVYT LHGTGEML RRALAQAGF++V WE+QA DF TVKSPNPE+Q AFAL 
Sbjct: 227 GRDMKIVYTSLHGTGEMDVRRALAQAGFnAVQVVEAQAVPHaDFLTVKSPNPENQDAFAL 286 

25 Query: 181 AEEI/SREVnanVLmTDPDADRLGVEIRQPDGSYKNLSQNQIGAIIAKYILEaHia'AGTL 240 

AEELGR VnADVLVATDPDRDRLGVEIRQPDGSY NLSGNQIGAIIAKYILEftHKTAGTL 
Sbjct: 287 AEEIX3HimiRI>VLVATDPn2a3RU3VEIRQPIX3SYLNLSGNQIGRIIAKYILERI^ 346 ' 

Query: 241 PENAALAKSIVSTKLVTKIAESYGATMFNVLTGFKFIAEKIQEFEEKHNHTYMFGFEESF 300 
30 P NAAL KSIVSTELVTKIAESYGATMENVLTGFKFI EKI EFE +HN+TYMFGFEESF 

Sbjct: 347 PANAALCKSIVSTELVTKIAESYGATMFNVLTGFKFIGEKIHEFETQHNYTYMFGFEESF 406 

Query: 301 GYLIKPFVRDKDAIQAVLLVaEIAAYYRSRGLTIADGIDEIYKEYGYFAEKTISVTLSGV 360 
GYLIKPFVRDKDAIQAVL+VAEIAAYYRSRG+TLADGI+EIYK+YGYF+EKTISVTLSCSV 
35 Sbjct: 407 GYLIKPFVRDKmiQAVLIVAEIAAYYRSRGMTLADGIEEIYRQYGYFSEICriSVTLSGV 466 

Query: 361 DGAAEIKKIMDKFRENGPKQFNNTDIVLLEDFQKQTATKNDGTISNLTTPPSNVLKYTIiA 420 

DGAAEIKKIMDKFR N PKQFNWTDI EDF +QTAT DG + LTTPPSNVLKY LA 
Sbjct: 467 DGAAEIKKIMDKPRRNAPKQFNNTDIAKTEDFLEQTATTADG-VEKLTTPPSNVLKYILA 525 

40 

Query: 421 DDSWIAVRPSGTEPKIKFYIATVGNDLftDftETKIANIEKElTTFV 465 

DDSW AWPSGTEPKIKFYIATVG ADA+ KIANIE EI FV 
Sbjct: 526 DDSWFAVRPSGTEPKIKFYIATVGETEADAKEKIANIEAEINAFV 570 

45 There is also homology to SEQ ID 6156: 

Query: 1 MAQHGIKSYVFEALRPTPELSFAVRHUaYAGIMVTASHNPAPENGYKVYGQDGGQLPPA 60 

+AQHGIKSYVFEAI.RPTPELSFAVRHLNAYAGIMVTASHNPAPFNGYKVYGQDGGQLPPA 
Sbjct: 107 LAQHGIKSYVFEALRPTPELSFAVRHLNAYAGIMVTASHNPAPFNGYKVYGQDGGQLPPA 166 

50 Query: 61 DADALTDFIRAIENPFAVEIADLDESKSSGLIQVIGEDVDIEYLREVKDVNINQDLINNF 120 

DADALTDFIRAIENPFAVELADLDE+KSSGLIQVIGEDVD+EYLREVKDVNINQDLINNF 
Sbjct: 167 DADALTDFIRAIENPFAVELftDLDENKSSGLIQVIGEDVDMEYLREVKDVNINQDLINNF 226 

Query: 121 GKDMKIVYTPIiHGTGEMLTRRALRQAGFESVWVESQAKADPDFSTVKSPNPESQAAFAL 180 
55 GKDMKIVYTPLHGTGEmTREALAQRGFESVVVVESQAKftDPDFSTVKSPNPESQaaFaL 

Sbjct: 227 GKDMKIVYTPLHGTGEMLTRRALAQAGFESVVVVESQAKaDPDFSTVKSENPESQRAFAL 286 

Query: 181 AEELGREVDADVLVATDPDADRLGVEIRQPDGSYKNLSGNQIGAIIAKYILEAHKTAGTL 240 
AEEI/3REV+ADVLVATDPDADRLGVEIRQPDGSYKSILSGNQIGAIIAKYILEAHKTAGTL 
60 Sbjct: 287 MELGREVEADVLVATDPDADRLGVEIRQPDGSYKNLSaSFQIGAIIAKYILEAHKTAGTL 346 

Query: 241 PENRALAKSIVSTELVTKIAESYGATMENVLTGFKFIAEKIQEFEEKHNHTYMFGFEESF 300 

PENAALAKSIVSTELVTKIAESYGATMFNVLTGFKFIAEKIQEFEEKHNHTYMFGFEESF 
Sbjct: 347 PENAALAKSIVSTELOTKIAESYGATMFNVLTGFKFIAEKIQEFEEKHNHTYMFGFEESF 406 
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Query: 301 GYLIKPFVia)KDAIQftVIjLVaEiaAYYRSRGLTLaiX3IDEIYKEYGYFAEKTISVTLSGV 360 

GYLIKPFVMJKDAIQAVLLVAElaAYYRSRGLTrjUXSIDEIYKEYGYFAERriSVTLSGV 
Sbjct: 407 GYLIKPFVRDKDAIQAVLLVAEiaAYYRSRGLTLMXSIDEIYKEYGYFAEKTISVTLSGV 466 

Query: 361 DGAAEIKKIMDKFRENGPKQFNNTDIVLLEDFQKQTATKNDGTISNLTTPPSNVLKYTLA 420 

DGAAEIKKIMDKFRENGPKQFNimJIVLLEDFQKQTATKtTCTISNLTTPPS]^ 
Sbjct: 467 DGAAEIKKI^roKFRENGPKQFNNTDIVLLEDFQKQTATKNDGTISNLTTPPSNVLKYTriA 526 

Query: 421 DDSWIAVRPSGTEPKIKFYIATVCaiDLADAETKIANIEKEITTPV 465 

DDSWIAVRPSGTEPKIKFYIAT+G+ L A+ KIANIE EI TFV 
Sbjct: 527 DDSWIAVRPSGTEPKIKFYIATIGDTLDIAQEKIANIETEINTFV 571 

Based on this analysis, it was predicted that this protein and its epitopes, covild be useful antigens for 
vaccines or diagnostics. 

Example 2119 

A DNA sequence (GBSx2235) was identified in S.agalactiae <SEQ ID 6541> which encodes the amino 
acid sequence <SEQ ID 6542>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

»> Seems to have no N-terminal signal sequence 



A related GBS nucleic acid sequence <SEQ ID 9905> which encodes amino acid sequence <SEQ ID 9906> 
was also identified. There is also homology to SEQ ID 32. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

Example 2120 

A DNA sequence (GBSx2236) was identified in S.agalactiae <SEQ ID 6543> which encodes the amino 
acid sequence <SEQ ID 6544>. This protein is predicted to be ABC transporter, ATP-binding protein 
(msbA). Analysis of this protein sequence reveals the following: 

Possible site: 48 

>» Seems to have an uncleavable N-term signal seq 
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Final Results 



bacterial cytoplasm Certainty=0 . 1564 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 4970 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- CertaintYi=0.0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 



>GP:AftD35376 GB:AE001710 ABC transporter, ATP-binding protein 
[Thermotoga maritima] 
Identities = 216/552 (39%), Positives = 336/552 (60%), Gaps = 3/552 (0%) 



Query: 26 MALLGTVVQVCLTVYLPVLIGQAVDVVLSPHSMILLLPIMWKMIAVILftNTIIQWIHPLL 85 
M + V L V P LIG+ +DW P LL M + + +++ W+ + 
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Sbjct: 41 MVFVFVTVSSILGVLSPYLIGKTIDWFVPRRFDLLPRYMLILGTIYALTSLLFWLQGKI 100 

Query: 86 YimLIFHYVASLRKAVMEKIiNLLPIAYIiDKRGIGDLISRVTTDTEQLSNGL^ 145 

L V LRK + EKL +P+ + D+ GD+ISRV D + ++N L QPF 
Sbjct: 101 MLTLSQDWPRLRKELFEKLQRVPVGFFDRTPHGDIISRVIiroVDNINNVLGNSIIQFFS 160 

Query: 146 GLLTIIVTIPSMAKIDLLMLFLVLFLTPLSLFLARFIAKKSY-HLYQNQTASRGRQTQFI 204 

G++T+ + M ++++++ + L + PL++ + + ++++ + Y+NQ G+ I 
Sbjct: 161 GIOTUWaVIMMFRVNVILSLOTLSIVPLTVLITQIVSSQTRKYFYENQRVL-GQIJS^^ 219 

Query: 205 EEMVSQESLlQaJFSRQEESSDHFRTINQEYftNFSQSAIFYSSTVNPSTRFINSLIYGFLA 264 

EE +S F+ +E+ + F +N+ A +S + P +N+I1 + ++ 

Sbjct: 220 EEDISGLWIKLFTREEKE^ffiKFDRVNESLRKVGTKAQIFSGVLPPIJ1NMVN^^ 279 

15 Query: 265 GIGALRIMSGAFSVGQLITFLNYVNQYTKPFNDISSVLSEMQSALACAERLYSILEESSP 324 

G G + +VG + TF+ Y Q+T+P N++S+ + +Q ALA. AER++ IL+ 
Sbjct: 280 GFGGWIiALKDIITVGTIATFIGYSRQFTRPIiNELSNQFimiQMALASAERIFEiriDLEEE 339 

Query: 325 NITGTEKLDSSTVKGQIDFKlWVFGyNKSKLLLNGIOTiHIPAGAKVAIVGPTGAG^^ 384 
20 + ++ V+G+I+FKNV F Y+K K +L I HI G KVA+VGPTG+GK+T++ 

Sbjct: 340 K-DDPimVELRETOGEIEFKNWFSYDKKKPVLKDITFHIKPGQKSffiLVGPTGSGKTTIV 398 

Query: 385 NLIMRFYEVDGGNILLDCKPITDYEPSQLRQEIGMVLQETWLKSATIHDNIAYANPKASR 444 
NL+MRFY+VD G IL+D I + S LR IG+VLQ+T L S T+ +N+ Y NP A+ 
25 Sbjct: 399 NLL^TOFYOTDRGQILVDGIDIRKIKRSSIlRSSIGIVLQDTlLFSTTVKENLKYGt^^ 458 

Query: 445 EEVIEAAKRftNADFFIKQLENGYDTyLEDAGDSLSQGQCQLLTIARIFLKLPRILILDEA 504 

EE+ EAAK ++D FIK LP GY+T L D G+ LSQGQ QLL I R FL P+ILILDEA 
Sbjct: 459 EEIKEAAKLTHSDHFIKHLPEGYETVLTDNGEDLSQGQRQr.LAITRAFLANPKII.IIjDEA 518 

30 

Query: 505 TSSIDTRTEVLVQEAFQMLMKGRTSFIIAHRLSTIQTADIILVMVSGEIVEVGNHSELMA 564 

TS++DT+TE +Q A LM+G+TS IIAHRL+TI+ AD+I+V+ GEIVE+G H EL+ 
Sbjct: 519 TSNVDTKTEKSIQAAMWIaM;GKTSIIIaHRIOTIKNaDLIIVIlRDGEIVE^raKHD 578 

35 Query: 565 QKGIYYQMQNRQ 576 

++G Yy++ +Q 
Sbjct: 579 KRGFYYELFTSQ 590 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6545> which encodes the amino acid 
40 sequence <SEQ ID 6546>. Analysis of this protein sequence reveals the following: 

Possible site: 56 
>» Seems to have an uncleavable N-term signal seq 

45 
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Final Results 

50 bacterial membrane CertaintyssO. 4227 (Affirmative) < suco 

bacterial outside Certaintyi=0, 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

55 >GP:AAD35376 GB:AE001710 ABC transporter, ATP-binding protein 

[Thermotoga maritima] 
Identities = 206/572 (36%) , Positives = 342/572 (59%) , Gaps = 5/572 (0%) 



Query: 2 IKTDHHLLKRVLQDLLKKPLPVCILVIASFVQVG--LSVYLPVLIGKAVDMSLSVNSWQT 59 

+K L+R+L L +P ++++ FV V L V P LIGK +D+ + 

Sbjct: 18 LKNPTATLRRLIX3YL--RPHTFTLIMVFVFVTVSSILGVLSPYLIGKriDVVFVPRRFDL 75 



Query: 60 LKWLLGQMLVIIVVNTLIQWVMPLVYSRLLYQYSQQLKDKLLEKIHRLPFAYLDRQTIGD 119 
L + + I + +L+ W+ + L +L+ +L EK+ R+P + DR GD 

65 Sbjct: 76 LPRYMLILGTIYALTSLLFWLQGKIMLTLSQDWFRLRKELFEKLQRVPVGFFDRTPHGD 135 
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Query: 120 LVSRVITDTEQLINGLQWFNQFILGLLTILCTIIAmQIDWLMLILVLVLTPSSLFIiAR 179 

++SRVI D + + N L QF G++T+ +1 M +++ ++ ++ L + P ++ + + 

Sbjct: 136 lISRVINDVDNINNVLGNSIlQFFSGIVTIJUaVimFRVIWILSLVTLSIVPLl^ 195 

5 

Query: 180 FIAQKSPHyAQAQTKSRG^^^QFTEEILRQEGLVQLE^IaQEQSICDYHVLNKTYCEASQK 239 

++ ++ Y + G L EE + +++LF +E+ + + +N++ + K 

Sbjct: 196 IVSSQTRKYFYENQRVLGQLNGIIEEDISGLTVIKLFTREEKEMEKFDRVNESLRKVGTK 255 

10 Query: 240 AIFYASTVNPATRFINSVIYALLAGLGAVRIMAGLFSVGQLTTFLNVWQYTKPFNDISS 299 

A ++ + P +N++ +AL++G G + + +VG + TF+ Q+T+P N++S+ 
Sbjct: 256 AQIFSGVLPPLMNMVNlSn^GFALISGFGGWI^iaDIITVGTIATFIGYSRQFTRPiaiELSN 315 

Query: 300 VLAEIQSSLACAQRLYDLLDIEIKEQEHFLTFKASAVKGQIDFEEVSPSYQKDRPLLKDI 359 
15 IQ +IiA A+R++++LD+E +E++ + V+G+I+F+ V FSY K +P+LKDI 

Sbjct: 316 QFNMIQMALASAERIFEIIJDLE-EEKDDPDAVELRETOGEIEFKNVWFSYDKKKPVLKDI 374 

Query: 360 NFSVPAGSKVAIVGPTGAGKSTLINLLMRFYELDAGSIKLDKVPIKCYAKEELRSITGIV 419 
F + G KVA+V(3PTG+GK+T++NIiIiiyiRFY++D G I +D + 1+ + LRS GIV 
20 Sbjct: 375 TFHIKPGQKVALVGPTGSGKTTIVNLLMRFYDVDRGQILVDGIDIRKIKRSSLRSSIGIV 434 

Query: 420 LQETWLKDATVHELIAYGSEEASRDEVVAATiKAAHftHFFIMQLPKTYDTYLSASDDALSQ 479 

LQ+T L TV E + YG+ A+ +E+ AAK H+ FX LP+ Y+T L+ + + LSQ 
Sbjct: 435 LQDTILFSTTVKENLKYGNPGATDEEIKEAAKLTHSDHFIKHLPEGYETVLTDNGEDLSQ 494 

25 

Query: 480 GQLQLLAIARMFLKKPKVLVLDEATSSIDIRTEAVIQEALKELMRGRTSFIIAHRLSTIQ 539 

GQ QLLAI R PL PK+L+LDEATS++D +TE IQ A+ +LM G+TS IIAHRL+TI+ 
Sbjct: 495 GQRQLLAITRAFLaNPKILIUDEATSNVin'KTEKSIQaftMWKIM 554 

30 Query: 540 SADLILVMDQGRLVEWGTHaSLMSKNGCYVRL 571 

+ADLI+V+ G +VE G H L+ K G Y L 
Sbjct: 555 NftDLlIVLRDGEIVEMGKHDELIQKRGFYYEL 586 

An alignment of the GAS and GBS proteins is shown below. 

35 Identities = 340/566 (60%) , Positives = 433/566 (76%) 

Query: 11 KKLVQDLLSKKSLVGMALLGTWQVCLTVYLPVLIGQAVDWLSPHSMILLLPIMWKMIA 70 

K+++QDLL K V + ++ + VQV L+VYLPVLIG+AVD+ LS +S L ++ +M+ 
Sbjct: 10 KRVLQDIiKKPLPVCILVIASFVQVGLSVYLPVLIGKAVDMSLSVNSWQTLKWLLGQMLV 69 

40 

Query: 71 VILANTIIQWIOTLLYiniLIFHYVaSLRKAVtffiKUIJLLPIAYLDK^ 130 

+1+ NT+IQW+ PL+Y+RL++ Y L+ ++EK++ LP AYLD++ IGDL+SRV TDTE 
Sbjct: 70 IIWNTLIQWVMPLVYSRLLYQYSQQLKDKLLEKIHRLPPAYLDRQTIGDLVSRVITDTE 129 

45 Query: 131 QLSNGLLMVFNQFFVGLLTIIVTIFSMAKIDLLMLFLVLFLTPLSLFLARFIAKKSYHLY 190 

QL NGL MVENQF +GLLTI+ TI +MA+ID LML LVL LTP SLFLARFIA+KS+H 
Sbjct: 130 QLINGIOIVENQFILGLLTlLCTIIflMAQIDWLMLILVLVLTPSSLFLaRFIAQKSFHYA 189 

Query: 191 QNQTASRGRQTQFIEEMVSQESLIQAFSAQEESSDHFRTINQEYANPSQSAIFYSSTVNP 250 
50 Q QT SRG QF EE++ QE L+Q F+AQE+S + +N+ Y SQ AIFY+STVNP 

Sbjct: 190 QAQTKSRGNLAQPTEEILRQEGLVQLFmQEQSICnDYHVLNKTYCEASQKAIFYASTVNP 249 

Query: 251 STRFINSLIYGFLAGIGALRIMSGAFSVGQLITFLNYVNQYTKPFNDISSVLSEMQSALA 310 
+TRFINS+IY LAG+GA+RIM+G FSVGQL TFLN V QYTKPFNDISSVL+E+QS+LA 
55 Sbjct: 250 ATRFINSVIYALLAGLGAVRIMAGLPSVGQLTTPLNVWQYTKPFNDISSVLAEIQSSLA 309 

Query: 311 CMRLYSILEESSPNITGTEKLDSSTTOSQIDFKNVVFGYNKSKLLLNGINLHIPAffl 370 

CA+RLY +L+ +S VKGQIDP+ V F Y K + LL IN +PAG+KV 

Sbjct: 310 CAQRLYDLLDIEIKEQEHFLTFKASAVKGQIDFEEVSFSYQKDRPLLKDINFSVPAGSKV 369 



60 



Query: 371 AIVGPTGAGKSTLINLIMRFYEVDGGNILLDCKPITDYEPSQLRQEIGMVLQETWLKSAT 430 

AIVGPTGAGKSTIiIlSrL+MRFYE+D G+I LD PI Y +LR G+VLQETWLK AT 
Sbjct: 370 AIVGPTGAGKSTLINLLMRFYELDAGSIKLDKVPIKCYAKEELRSIT6IVLQETWLKDAT 429 



65 Query: 431 IHDNIAYANPKASREEVIE?iAKAftNM)FFIKQLPNGYDTYLEDAGDSLSQGQCQLLTIAR 490 

+H+ lAY + +ASR+EV+ AAKAA+A FFI QLP YDTYL + D+LSQGQ QLL lAR 
Sbjct: 430 VHELIAYGSEEASRDEWAAAKaAHfiHFFIMQLPiCTYDTYLSASDDALSQGQLQLLAIAR 489 
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Query: 491 IFLKLPRILILDEATSSIDTRTEVLVQEAFQMLMKGRTSFIIAHRLSTIQTADIILVMVS 550 

+FLK P++L+LDEATSSID RTE ++QEA + LM+GRTSFIIAHRLSTIQ+AD+ILVM 
Sbjct: 490 MFLKKPKVLVLDEATSSIDIRTEAVIQEALKELMRGRTSFIIAHRLSTIQSADLILVMDQ 549 

5 

Query: 551 GEIVEVGNHSELMAQKGIYYQMC3NAQ 576 

G +VE G H+ LM++ G Y ++Q + 
Sbjct: 550 GRLVEWGTHASLMSKNGCYVRLQKIE 575 

10 Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

Example 2121 

A DNA sequence (GBSx2237) was identified in S.agalactiae <SEQ ID 6547> which encodes the amino 
acid sequence <SEQ ID 6548>. Analysis of this protein sequence reveals the following: 

15 Possible site: 26 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1099 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GEM'EPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

25 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2122 

A DNA sequence (GBSx2238) was identified in S.agalactiae <SEQ ID 6549> which encodes the amino 
acid sequence <SEQ ID 6550>. This protein is predicted to be ABC transporter, ATP-binding protein 
30 (msbA). Analysis of this protein sequence reveals the following: 

Possible site: 37 

>» Seems to have no N-terminal signal sequence 

35 
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40 Final Results 

bacterial membrane Certainty=0. 6477 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AftD35375 GB:AE001710 ABC transporter, ATP-binding protein 
[Thermotoga maritima] 
Identities = 196/570 (34%) , Positives = 327/570 (56%) , Gaps = 5/570 (0%) 

50 Query: 1 MKRLTYYFKGYIKETIFGPLFKLLEASFELLVPIVIAKMIDETIPRGDRSGLLLQIGLIF 60 

MK L Y K Y + • PLF ++E +L P ++A+++DE I RGD S L+L+ G++ 
Sbjct: 1 MKTLaRYLKPYWIFAVLAPLFMVVEVICDLSQPTLLARIVDEGIARGDFS-LVLRTGIIiM 59 



Query: 61 FLAA-VGVWAITAQYYSSKAAVGYTRQLTEDLYQKVMSLGKKDRDELGTASLITRLTAD 119 
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+ A +G V I ++S A+ + L DL++KV+S + + T+SLITRLT D 
Sbjct: 60 LIVaLIGAVGGIGCWFASYASOHFGRDLRRDLFRKVLSFSISNVimFHTSSLITRLTND 119 

Query: 120 TFQIQTGLNQFLRLFLRAPIIVFGAIIMAFSISPSLTIWFLVMVVTLFIIVFVMSRLLNP 179 
5 Q+Q + LR+ +RAP++ G I+MA SI+ L+ + ++ + ++ +++ NP 

Sbjct: 120 OTQLQ^rLVMMLLRIVVRAPLLFVGGIVMAVSINVKLSSVLIFLIPPI^^:JLFVWLTKKGNP 179 

Query: 180 lYLKIRTSTDYLVKLTRQQLQGVRVIRAFNQVDRESEAFNDINYHYTNLQLKAGRLSSLV 239 
++ KI+ STD + ++ R+ L GVRV+RAF + + E+E FN + A L 

10 Sbjct: 180 LFRKIQESTDEVmWRENLLGVRVVRAFRREEYENENFRKANESLRRSIISAFSLIVFA 239 

Query: 240 TPLTFLVVNITLWIIWRCasiIJSIftNHLLSQGMLVALINYLLQILVEIiLKMTM^ 299 

PL +VN+ ++ ++W G + + N+ + G ++A NYL+QI+ L+ + ++ + ++ 
Sbjct: 240 LPLFIFIVNMGMIAVLWPGGVLVRJSlNQMEIGSIMAyTKm^QIMFSIJ^ 299 

15 

Query: 300 YISAKRIIAVF-ERPS-EIIDDKLEPKYSNKALEVQEMAFSYPNSSEKALSDITFSMNVG 357 

SAKR++ V E+P+ E D+ L ++ + + F Y +++ LS + FS+ G 

Sbjct: 300 SASAKRVLEVIJffiKPAIEEMNAtiALENVEGSVSFENVEFRYFEimjP^ 359 

20 Query: 358 ETMIIGGTGSGKSTLINLLLHIYKVQEGDIDIYHQGKBPDTISNWRTLTOVVPQKIAQnF 417 

+ ++G TGSGKSTL+NL+ + + G +++ + + R + VPQ LF 

Sbjct: 360 SLVAVLGETGSGKSTLMIS&IPRLIDPERGRVEVDELDTOTVKLKDLRGHISAVPQETVLF 419 

Query: 418 KGTIRSNLSLGLGKVSEEKLWTALEIAQASDFVKEKDGQLDAPVESFGRNFSGGQRQRLT 477 
25 GTI+ NL G +++++ A +IAQ DF+ D+ VE GR1IFSGGQ+QRL+ 

Sbjct: 420 SGTIKENLKWGREDATDDEIVEAAKIAQIHDFIISLPEGYDSRVERGGRNFSGGQKQRLS 479 

Query: 478 lARALVQDKIPFLILDnATSALDYLTEARLFKAITKHENQTNLIIVSQRINSIQNADRIL 537 
IARaLV+ K LILDD TS++D +TE R+ + ++ I++Q+I + AD+IL 

30 Sbjct: 480 lARALVK-KPKVLILDDCTSSVDPITEKRILDGLKRYTKGCTTFIITQKIPTALLADKIL 538 

Query: 538 LLDKGKQVGFDNHQSLLAHNKVYKSIYHSQ 567 

+L +GK GF H+ LL H K Y+ lY SQ 
Sbjct: 539 VLHEGKVAGFGTHKELLEHCKPYREIYESQ 568 



35 



A related DNA sequence was identified in S.pyogenes <SEQ ID 655 1> which encodes the amino acid 
sequence <SEQ ID 6552>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have no N-terminal signal sequence 
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45 

Final Results 

bacterial membrane Certainty=0 . 5989 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

!GB:AL137187 putative ABC transporter [Streptomyces ... 296 6e-79 

55 >GP:CaB69751 GB:AL137187 putative ABC transporter [Streptonyces 

coelicolor A3 (2) ] 

Identities = 185/569 (32%) , Positives = 306/569 (53%) , Gaps = 8/569 (1%) 

Query: 1 MKRLRPYVKGYLKESILGPLFKLLEaLFELLVPLLIANMIDISISQHNSQGILRWLTLF 60 
60 ++ LR Y++ Y K L + L+ L +P L A++ID + + +S IL + 

Sbjct: 3 IRLLRTYLRPYKKPIALLVALQFLQTCASLYLPTLNAHIIDEGWKGDSGYILSYGALMI 62 

Query: 61 GLATIGLLLSVTAQYFSSKAAVGFTRQMTDDLFKKIMFLSKEDQDHLGYASLLSRLTSDS 120 
G++ ++ ++ A ++ ++ A R + +F ++ S + H G SL++R T+D 
65 Sbjct: 63 GISLAQWCNIGAVFYGARTAAfiLGRDVRGAVFDRVQSFSAREVGHFGAPSLITRTTNDV 122 
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Query: 121 FQIQTGINQFI,RLFLRAPIIVCGaMVM^m^ISPSLTLWFVM^lVIVLLTLVFVMSHLLGPL 180 

Q+Q L + API+ G +VMA + L+ + +V VL V ++ L PL 

Sbjct: 123 QQVQrEiRIOTPTLMVSAPIMCTGGIVMMXSLDVPLSGVLI/SVVPVia, 182 

5 

Query: 181 YLLIRRETDHLTOLTSQQLQGIRVIKaENQTQKELQJiFKQQimLLSRHQYQRATLftN^ 240 

+ ++ D + R+ +Q+ G RVI+AF + + E Q F++ N L+ L ++ 

Sbjct: 183 FRKMQWLDTVireVLREQITGNRVIRaFVRDEYEQQRFRKANTELTEVALGTGNLLALMF 242 

10 Query: 241 PMTFLVVNLTLLILIWQGSWQVAHRSLSQGMLVALINYLLQiriftELLKMTMLMGTINQSV 300 

P+ WNL+ + ++W G+ ++ + G L A + YL+QH- ++ T + + ++ 
Sbjct: 243 PVVMTVViniiSSIAVVWFGRHRIDSGGMQIGDLTAFLAYMQIVMSVMK^ 302 

Query: 301 TAAKRINQVFVLADEAPLPLLKDGPISTH-LLTIRHLTFTYPGAAEPSLYDIQLSADQGE 359 
15 A+RI +V P+ ' + H L IR F YPGA EP L I L A GE 

Sbjct: 303 VC3iERIQEVLETESSWPPVAPVTELRRHGHLEIREAGFRYPGAEEPVLRHIDLVARPGE 362 

Query: 360 WIGIIGGTGAGKTTLIDLICOTfSQYSGEISIiNW---QGEVPKTLTEWRNVIAI)VPQKAQ 416 
+IG TG+GK+TL+ L+ + + GE+ +-N + PKTL + V++LVPQK 
20 Sbjct: 363 TTAVIGSTGSGKSTLK3LVPRLFimTDGEVLWGVIWRm)PKTIjAK WSLVPQKPY 419 

Query: 417 LFKGTIRSOTjLICQSMPISDEELWRALELaQAKEFVaALPEQLEAPVEAFGRHFSGGQRQ 476 

LF GT+ +NL G + +DEELW AL +AQAKEFV+ L L+AP+ G + SGGQRQ 
Sbjct: 420 LFAGTVATNLRYG-NPDATDEELWHALAVAQAKEFVSELEGGLDAPIAQGGTNVSGGQRQ 478 

25 

Query: 477 RIAIARALLKPKPILILDDASSALDNETRGRLFKALKEELSDVLVILVTQSIKNLQFADK 536 

RLAIAR L++ I + DD+ SALD T L L +E ++ V++V Q + ++ AD+ 
Sbjct: 479 RIAIARTLVQRPEIYLFDDSFSALDYATDftALRAELAQETAEATWIVAQRVATIRDftDR 538 

30 Query: 537 ILVLEQGHQLDFASHDQLKVSNALYQEML 565 

I+VL++G + H +L N Y+E++ 
Sbjct: 539 lyVLDEGRVVGVGRHHELMADNETYREIV 567 

An alignment of the GAS and GBS proteins is shown below. 

35 Identities = 313/568 (55%) , Positives = 428/568 (75%) , Gaps = 9/568 (1%) 

Query: 1 MKRLTYYFKGYIKETIFGPLFKLLEASFELLVPIVIAKMIDETIPRGDRSGLLLQIGLIF 60 

MKRL Y KGY+KE+I GPLFKLLEA FELLVP++IA MID +1 + + G+L + +F 
Sbjct: 1 MKRLRPYVRGYLKESILGPLFKLLEALFELLVPLLIANMIDISISQHNSQGILRVVLTLF 60 



40 

Query: 61 FLRAVGVVVAITAQYJSSKftATOTrRQLTEDLYQKVMSLGKKDRDELGTASLITRLTADT 120 

LA +G+++++TAQY+SSKAAVG+TRQ+T+DL++K+M L K+D+D LG ASL++RLT+D+ 
Sbjct: 61 GLATIGLLLSVTAQYFSSKaAVGFTRQMTDDLFKKIMFLSKEDQDHLGYASLLSRLTSDS 120 

45 Query: 121 FQIQTGLNQFLRLFLRAPIIVFGAIIMAFSISPSLTIWFLVMWTLFIIVFVMSRLLNPI 180 

FQXQTG+NQFLRLFLRAPIIV GA++MA+ ISPSLT+WF++MV+ L +VFVMS LL P+ 
Sbjct: 121 FQIQTGIHQFLRLFLRAPIIVa3aMVMAYWISPSLTLWFV^WIVLLTLVFVMSHLLGPL 180 

Query: 181 YLKIRTSTDYLVKLTRQQLQGTOVIRAENQVDRESEAFNDIiranrEraiQLKAGRLSSLVT 240 

50 YL IR TD+LV+LT QQLQG+RVI+AFNQ +E +AF N + Q +A L++++ 

Sbjct: 181 YLLIRRETDHLVRLTSQQLQGIRVIKAFNQTQKELQAFKQQNMLLSRHQYQAATLANVLN 240 

Query: 241 PLTFLWNITLWIIWRGNLNIANHLLSQGMLVALINYLLQILVELLKMTMLVTSLNQSY 300 
P+TFLWN+TL+++IW+G+ +A+ LSQGMLVALINYLLQIL ELLKMTML+ ++NQS 
55 Sbjct: 241 PMTFLV\ra.TLLILIWQGSWQVMRSLSQGMLVALIim,LQILAELLKOTN™GTINQSV 300 

Query: 301 ISAKRIIAVF ERPSEIIDDKLEPKYSNKaLEVQEMAFSYPNSSEKALSDITFSMNV 356 

. +AKRI VF E P ++ D S L ++ + F+YP ++E +L DI S + 

Sbjct: 301 TAAKRINQVFVIADEAPLPLLKD-— GPISTHLLTIRHLTFTYPGAAEPSLYDIQLSADQ 357 

60 

Query: 357 GETLGIIGGTGSGKSTLINLLLHIYKVQEGDIDIYHQGKSPDTISNWRTLVRWPQNAQL 416 

GE +GIIGGTG+GK+TLI+L+ Y G+I + QG+ P T++ WR ++ +VPQ AQL 
Sbjct: 358 6EWIGIIGGTGAGKTTLIDLICQTYSQYSGEISLIWQGEVPKTLTEWRNVIALVPQKAQL 417 

65 Query: 417 FKGTIRSlSrLSLGLG-KVSEEKLWTALEIAQaSDFVKEKDGQLDAPVESFGRNFSGGQRQR 475 

FKGTIRSNL LG +S+E+LW ALE+AQA +FV QL+APVE+FGR+FSGGQRQR 
Sbjct: 418 FKGTIRSNLLLGQSMPISDEELWRALELAQAKEFVAALPEQLEAPVEAFGRHFSGGQRQR 477 
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Query:- 476 LTIARMiVQDKIPFLILDDATSALDYLTEftRLFKAITKHFNQimilVSQRINSIQNADR 535 

h IARAL++ K P LILDDA+SALD T RLFKA+ + + +I+V+Q I ++Q AD+ 
Sbjct: 478 LAI2U?ftLLKPK-PILiri)DASSaLDNETRGRLFKaLKEELSDVLVILVTQSIKNI.QFADK 536 

Query: 536 ILLLDKGKQVGFDNHQSLLRHNKVYKSI 563 

IL+L++G Q+ F +H L N ,+Y+ + 
Sbjct: 537 ILVLEQGHQLDFASHDQLKVSNAIiYQEM 564 

Based on this analysis, it was predicted that this protein and its epitopes, could be useM antigens for 
vaccines or diagnostics. 



Example 2123 

A DNA sequence (GBSx2239) was identified in S.agalactiae <SEQ ID 6553> which encodes the amino 
acid sequence <SEQ ID 6554>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

»> Seems to have an luicleavable N-terra signal seq 

INTEGRAL Likelihood =-12.26 Transmembrane 8 - .24 ( 1 - 28) 

Final Results 

bacterial membrane Certainty=0 . 5904 (Affirmative) < suco 

bacterial outside Certainty=0.0000(Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000(Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB84433 GB:AF027868 RAS-related protein [Bacillus subtilis] 
Identities = 53/140 (37%) , Positives = 78/140 (54%) , Gaps = 2/140 (1%) 

Query: 28 VKKVLQYHDLVQNTLAENGSEaNVHLVLSMIYTETKBDAIDVMQSSESISGTTNSITDSH 87 

++++ Y LV+ L G L+L M+Y E+KG D MQSSES+ N ITD 

Sbjct: 49 LERLTDYKPLVEEELESQGLSNYTSLILGMMYQESKGKCaroPMQSSESLGLKRMEITDPQ 108 

Query: 88 TSIKHGVTLLSQNISQAKKAKVDVWTAVQAYNFGSSYIDYVADHGGENSIELAKNYSKNV 147 

S+K G+ + K+ VD+ T +Q+YN G+ YID+VA+HGG ++ ELAK YS+ 

Sbjct: 109 LSVKQGIKQFTLMYKTGKEKGVDLDTIIQSYNMGAGYIDFVAEHGGTHTEELAKQYSEQQ 168 



Query: 148 VA- -PSLGNYNGDTYFYYHP 165 

V PL G+ + +P 
Sbjct: 169 VKKNPDLYTCGGNAKNFRYP 188 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4143> which encodes the amino acid 
sequence <SEQ ID 4144>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -2.66 Transmembrane 8 - 24 ( 7 - 25) 



Final Results 

bacterial membrane Certainty=0. 2062 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 134/200 (67%), Positives = 165/200 (82%), Gaps = 1/200 (0%) 

Query: 1 MFKFLKRLIALIIIIFIGYRLVIIHENVKKVLQYHDLVQNTLaENGSEANVHLVLSMIYT 60 

MF+ LKR + +++ F+ Y+ +IH NV++VL Y +V+ TLAEN ++ANV LVL+MIYT 
Sbjct: 1 MFRLLKRACSFLLL-FVIYQSFVIHHNVQRVLAYKPMVEKTLAENDTKANVDLVLftMIYT 59 

Query: 61 ETKGDAIDVMQSSESISGTTNSITDSHTSIKHGVTLLSQNISQAKKAKVDVWTAVQAYNF 120 
ETKG DVMQSSES SG NSITDS SI+HGV LLS N++ A++A VD WTAVQAYNF 



10 
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Sbjct: 60 ETKGGEMIVMQSSESSSGQKNSITDSQASIEHGWSILLSHNLALAEEAGVDSWTAVQAYNF 119 

Query: 121 GSSYIDWADHGGENSIEI^KireSKMWAPSIiGKmiaGDTYFYYHPrJU:.ISGGKL^ 180 

G++YIDY+A+HGG+N+++IA YSK WAPSLGN +G TYFYYHPLALISGGKLYKNGGN 
Sbjct: 120 GTAYIDYIAEHGGQimTDLATTYSraVVaPSLGNTSGQTYFYYHPL^ 179 

Query: 181 IVYSREVQENLYLIKIMELF 200 

lYYSREV ENLyLI++M LP 
Sbjct: 180 lYYSREVHPNLYLIELMSLF 199 



SEQ ID 6554 (GBS244) was expressed in E.coU as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 59 (lane 4; MW 23.1kDa). It was also e3q)ressed in E.coU as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 67 (lane 2; MW 48kDa). 

GBS244-GST was purified as shown in Figure 211, lane 5. 

15 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2124 

A DNA sequence (GBSx2240) was identified in S.agalactiae <SEQ ID 6555> which encodes the amino 
acid sequence <SEQ ID 6556>. Analysis of this protein sequence reveals the following: 

20 Possible site: 38 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2401 (Affirmative) < suco 

25 bacterial membrane Certaintyi=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9837> which encodes amino acid sequence <SEQ ID 9838> 
was also identified. 

30 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB71302 GB:AJ130879 hypothetical protein [Clostridium 
sticklandii] 

Identities = 32/95 (33%) , Positives = 53/95 (55%) , Gaps = 1/95 (1%) 

35 Query: 235 LSPEKLADQLFDDNLTARLTFVDELKDAIPGPVQVSDIDHSRQIKKLENQKLSLSNGIEL 294 

LS EK + F++ + + + L A Q+ ++ + +K E QK+ +GIE+ 

Sbjct: 2 LSVEKALETAFEETDEIKAIYKEALSKAGIENEQI-EVSETALKRKFEIQKIITESGIEV 60 

Query: 295 IVPNNVYQDAESVEFIQNPDGTYSXLIKNIQDIQN 329 
40 +P N Y D +EF+ N DGT S++IKNI +IQ+ 

Sbjct: 61 KIPVNYYGDPSKLEFVftNGDGTVSLVIKNIGNIQS 95 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6557> which encodes the amino acid 
sequence <SEQ ID 6558>. Analysis of this protein sequence reveals the following: 

45 Possible site: 52 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty^O. 3336 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 246/325 (75%) , Positives = 286/325 (87%) 



Query: 


6 


MMDFYIKQIIIHQFSPNDTELVLSDTPLTLTPRIDDYFRKKLSKVFSDEAKRGyFGEDNV 


65 




M+D YIK+I+IHQFSPNDTEL+LSD +++TPRID+YFRKKL+KVFSDEAKRG F +N 




Sbjct: 


1 


MLDSYIKRIVIHQFSEiroTELIJjSDRLVSITPRIDEYFRKKLAKVFSDEAKRGQFES^^ 


60 


Query: 


66 


FMSHI^DDLYVSSCQIAQLWKEEFVISEDQKTNDLVPIQFDKDGMEHFAFLRISLKEQFA 


125 






F + + DDL +S lAQLWKE FVISEDQKTNDLVF+QFDKDG FAFLRI+LKEQFA 




Sbjct: 


61 


FFTTIGDDLLETSVTIAQLWKEAFVISEDQKHroLVFVQFDKDGEPFPAFLRIALKEQFA 


120 


Query: 


126 


HVSENQEQPITITQNNLPSAAQTPDEALWNKSSKQYYLIEKRIKHNGSFANYFSENLLQ 


185 






H+S+N E P T+TQNNLPS QTPDEftLV+N S QYYLIEKR+KHNGSFANYFSE+LL+ 




Sbjct: 


121 


HLSDimiHPFTVTQNlJLPSPTQTPDEALVINLKSGQYYLIEKRVKHNGSFAOTFSEH^ 


180 


Query: 


186 


VQPEQSWKSIKMVEQTAQKIAENENKDDFSFQSKMKSAIYKNLEEEQELSPEKLADQLF 


245 






V PEQSVKKSIKM+EQTAQKIAE+FN+DDF+FQSKMKS ++K LE + LSPEKLADQLF 




Sbjct: 


181 


VTPEQSVKKSIKMIEQTAQKIAEHFNQDDFTFQSKMKSTLFKQLEADDVLSPEKLADQIjF 


240 


Query: 


246 


DDNLTARLTFVDELKDAIPGPVQVSDIDHSRQIKKLENQKLSLSNGIELIVPNNVYQDAE 


305 






DniSILTARLTFVD++KD IP P+++SDI+HSRQIKKLENQKLSLSNGIEL VPN +YODAE 




Sbjct: 


241 


DDlSniiTARLTFVDQVKDVIPEPIKISDIEHSRQIKKLENQKLSLSNGIELTVENAIYQDAE 


300 


Query: 


306 


SVEFIQNPDGTYSILIKNIQDIQNK 330 








+VEP+ N DGTYSIIiIKNI+DI+ K 




Sbjct: 


301 


AVEFLiaroDGTYSILIKNIEDIRTK 325 





Based on this analysis, it was predicted tiiat this protein and its epitopes, could be useftil antigens for 
vaccines or diagnostics. 

Example 2125 

A DNA sequence (GBSx2241) was identified in S.agalactiae <SEQ ID 6559> which encodes the amino 
acid sequence <SEQ ID 6560>. This protein is predicted to be Serine hydroxymethyltransferase (glyA-1). 
Analysis of this protein sequence reveals the following: 
Possible site: 45 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certaintyi=0. 3 87 6 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certaintyi=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD358Q2 GB:AE001743 serine hydroxymethyltransferase [Thermotoga maritima] 
Identities = 243/416 (58%) , Positives = 307/416 (73%) , Gaps = 7/416 (1%) 



Qa&ry: 


9 


KEFDQELWQAIHDEEIRQQNNIELIASENVVSKAVMAAQGSVLTKKYAEGYPSHRYYGGT 


68 






K+ D E+++ + +E RQ+ +ELIASEN S AV+ GS+LTNKYAEGYP RYYGG 




Sbjct: 


6 


KQVDPEIYEVLVNELKRQEYGLELIASENFASLAVIETMGSMLTNKYAEGyPKKRYYGGC 


65 


Query: 


69 


DCVDVVESLAIERAKTLFNAEFAOTQPHSGSQaNAAAYMALIEPGDTVLGMDLAAGGHLT 


128 






+ VD E AIERAK LF A+FANVQPHSGSQRN A Y+AL +PGDT++GM L+ GGHLT 




Sbjct: 


66 


EWVDRAEERAIERAKRLFGAKFANVQPHSGSQANMAVYLALAQPGDTIMGMSLSHGGHLT 


125 


Query: 


129 


hgasvsfsgktyhfvsysvdpktemldydnilkiaqetqpklivagasaysriidfeker 


188 






HGA V+FSGK + V Y V+ +TE +DYD + ++A E +PK+IVAG SAY+RIIDF++FR 




Sbjct: 


126 


hgapvnfsgkifkwpygvnletetidydevrrlalehkpkiivaggsayariidfkrfr 


185 


Query: 


189 


QIADAVDAYL^OTD^mHIAGLVASGHHPSPIPYAHVTTTTTHKTLRGPRGGLILTNDEAIA 


248 






+IAD V AYIMVDMAH AGLVA+G HP+P+ YAHV T+TTHKTLRGPRGGLILTND lA 




Sb j ct : 


186 


eiadevgaylmvdmahfaglvaagihpnpleyahvvtstthktlrgprggliltndpeia 


245 


Query: 


249 


KKINSAVFPGLQGGPLEHVIAAKAVALKEAIiDPSFKIYGEDIIBOlAQaMAKVFKEDDDFH 


308 
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K ++ +FPG+QGGPL HVIAAKAV KEa+ FK Y + ++KNA+ MA+ F++ + 
Sbjct: 246 KAVDKTIFPGIQGGPLrraVIAAKftVCFKEAMTEEFKEYQKQVVK^^ 304 

Query: 309 LISDGTDNHLFL^7D^r^KVIENGKKAQ]mlEEVNITLNKNSIPFERLSPFKTSGIRIGTPA 368 
5 ++S GTD HIiFLVD+T GK A+ LE IT+NKN+IP E+ SPF SGIRIGTPA 

Sbjct: 305 IVSGGTDTHLFLVDLTPKDITGKAAEKMjESCGITVNKNTIPNEKRSPFVASGIRIGTPA 364 

Query: 369 ITSRGMGVEESRRIAELMIKRLKN--HENQDVI1TEWQE IKSLTDAFPLYEN 418 

+T+RGM EE IAE++ L N EN V EVR+E ++ L + FPLY + 
10 Sbjct: 365 VTTRGMKEEEMEEIAEMIDLVLSNVIDENGTVKPEVREEVSKKVRELCERFPLYRD 420 

A related DNA sequence was identified in S.pyogenes <SEQ ID 656 1> which encodes the amino acid 
sequence <SEQ ID 6562>. Analysis of this protein sequence reveals the following: 

Possible site: 47 



15 



»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.00 Transmembrane 196 - 212 ( 196 - 212) 



Final Results 

20 bacterial membrane Certainty=0. 1001 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certain.ty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

25 >GP:CAB15707 GB:Z99122 serine hydroxymethyltransf erase [Bacillus subtilis] 

Identities = 250/407 (61%) , Positives = 311/407 (75%) , Gaps = 2/407 (0%) 

Query: 14 DKELWDAIHAEEERQEHHIELIASENMVSKAVMAAQGSVLTNKYAEGYPGNRYYGGTECV 73 
D+++++AI E ERQ+ lELIASEN VS+AVM AQGSVLTNKYAEGYPG RYYGG E V 
30 Sbjct: 8 DEQVENAIKNERERQQTKIELIASENFVSEAVMEAQGSVLTNKYAEGYPGKRYYGGCEHV 67 

Query: 74 DIVETLAIERAKKLFGAAFAlWQftHSGSQKNftAAYMALIEZMSnm^ 133 

D+VE +A +RAK++PGa NVQ HSG+QftN A Y ++E GDTVLGM+L+ GGHLTHGS 
Sbjct: 68 DVVEDIARDRAKEIFGAEHVNVQPHSGAQAmAVYFTIIiEQGDTVIXSMN^ 127 

35 

Query: 134 PVNFSGKTYHFVGYSVDTDTEMLNYEAILEQAKAVQPKLIVAGASAYSRSIDFEKFRAIA 193 

PVNFSG Y+FV Y VD +T+ ++Y+ + E+A A +PKLIVi«3ASAY R+IDF+KFR lA 
Sbjct: 128 PVNFSGVQYNFVEYGVDKETQYIDYDDVREKRIAHKPKLIVAGASAYPRTIDFKKFREIA 187 

40 Query: 194 DHVGAYLMVDMAHIAGLVRAGVHPSPVPYAHITOSTTHKTLRGPRGGLILTraDEAIAK^ 253 

p ViSAY MVDMftHIAGIiVAftG+HP+PVPYA VT+TTHKTIiRGPRGG+IIi +E KKI 
Sbjct: 188 bEVGAYFMVDMAHIAGLVAAGLHPNPVPYADFVTTTTHRrLRGPRGGMILCREE-FGKKI 246 

Query: 254 NSAVFPGLQGGPLEHVIAAKAVAFKEALDPAFKDYAQAIIDNTAAMAAVFAQDDRFRLIS 313 
45 + ++FPG+QGGPL HVIAAKAV+F EL FK YAQ +1 N +A ++ +L+S 

Sbjct: 247 DKSIFPGIQGGPLMHVIAAKAVSFGEVLQDDFKTYAQNVISNAKRLAEALTKEG-IQLVS 305 

Query: 314 GGTD)SIHVFLVDVTKVIANGKLA<3*n^LDEVNITIJmiAIPFETLSPFKTSGIRIGCaAIT 373 
GGTDNH+ LVD+ + 6K+A+++LDE+ IT NKNAIP++ PF TSSIR+G AA+TS 
50 Sbjct: 306 GGTDNHLILVDIfiSLSLTGKTAEHVLDEIGITSNKNAIPYDPEKPFVTSGIRI^ 365 

Query: 374 RGMGVKESQTIARLIIKALVNHDQETILEEVRQEVRQLTDAFPLYKK 420 

RG + + +1 AL NH+ E LEE RQ V LTD FPLYK+ 

Sbjct: 366 RGFDGDMiEEVGAIIALALKNHEDEGKLEERRQRVAALTDKFPLYKE 412 

55 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 330/417 (79%) , Positives = 358/417 (88%) 

Query: 1 MIFDKDNFKEFDQELWQAIHDEEIRQQNNIELIASENVVSKA\«4AAQGSVLTISrKYAEGYP 60 
60 MIFDK N ++FD+ELW AIH EE RQ+++IELIASEN+VSKAVMAAQGSVI.TNKyAEGYP 

Sbjct: 3 MIFDKGNVEDFDKELWmiHAEEERQEHHIELIASEimVSKAVMAAQGSVLTNKY^^ 62 



Query: 61 SHRYYGGTDCVDWESLAIERAKTLFNAEFANVQPHSGSQANAAAYMALIEPGDTVLGMD 120 
+RYYGGT+CVD+VE+LAIERAK LF A FANVQ HSGSQANftAAYMALIE GDTVLGMD 
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Sb j ct : 


63 




122 


Query: 


121 


liAAGGHLTHGASVSFSGKTYHFVSYSVDPKTEMLDYDNILKIAQETQPKLI^ 


180 






IiaaGGHLTHG+ V+PSGKTYHPV YSVD TEML+Y+ IL+ A+ QPKLIVAGaSAYSR 




Sb j ct : 


123 




182 


Query: 


181 


IIDFEKFRQIADAVDAYLMVDMAHIAGLVASGHHPSPIPYAHVTTTTTHKTLRGPRGGLI 


240 






IDFEKFR IAD V AYLMVDMAHIAGLVA+G HPSP+PYAH+ T+TTHKTLRGPRGGLI 




OJJJ CU ■ 




QTnT?T?irPRaTanH\7naVT.M\n^M2iWTaf3TA7Zian\7HPQW7DVaWT^7TQTTHKT'T.R 

OjLLtC Sli]\S! S^j\XnUsXy\M\^LK'iv V XOX XlliVX urv^^irA^aVjJjX 




Query: 


241 


LTNDEAIAKKINSAVFPGLQGGPLEHVIAAKAVALKEALDPSPKIYGEDIIKNaQftM?^ 


300 






LTNDEA+AKKINSAVFPGLQGGPLEHVIAAKAVA KEALDP+FK Y + II N JUIA V 




OiJj c u • 




LTSroEALAKKINSAVFPGLQGGPLEHVIAAKaVaFKEALDPAFKDYAQaillOTJ^^ 


302 




301 


FKEDDDFHLISDGTDNHLFLVDVTKVIENGKKAQNVLEEVNITIiNKNSIPFERLSPFKTS 


360 






F +DD F LIS GTDNH+FLVDVTKVI NGK AQN+L+EVNITLNKN+IPFE LSPFKTS 




Sb j ct : 


303 


FAQDDRFRLISGGTONHWLVDVTKVIANGKIAQNLLDEVNITmKNAIPFETLSPFKTS 


362 


Query: 


361 


GIRIGTPAITSRGMGVEESRRIAELMIKALKNHENQDVLTEVRQEIKSLTDAFPIiYE 417 






GIRIG AITSRGMGV+ES+ lA L+IKAL NH+ + +L EVRQE++ LTDAFPLY+ 




Sbjct: 


363 


GIRIGCAAITSRGMGVKESQTIARLIIKALVNHDQETILEEVRQEVRQLTDAFPLYK 419 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2126 

A DNA sequence (GBSx2242) was identified in S.agalactiae <SEQ ID 6563> which encodes the amino 
acid sequence <SEQ ID 6564>. Analysis of this protein sequence reveals the following: 
Possible site: 30 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2289 (Affirmative) < suco 

bacterial membrane Certainty=0.0000(Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9839> which encodes amino acid sequence <SEQ ID 9840> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD35934 GB:AE001752 conserved hypothetical protein [Thermotoga maritima] 
Identities = 71/198 (35%) , Positives = 114/198 (56%) , Gaps = 4/198 (2%) 



Query: 


1 


MOTLGQILEDHGAVIMPTETVYGIFAKALSEEAVNHVYELKKRPRDKft^^ 


60 






+ + ++L + +1 PTETVYGI A A +EEA +++LK+RP D + ++I F+ + 




Sb j ct : 


17 


LKSaAELLRHGEVIIFPTETVYGIGftDAYNEEACKKIFKLKERPADNPLIVHIHSFKQLE 76 


Query: 


61 


KYSKNQPTYLKQLYDAFLPGPLTIIL-EASQEVPHWINSGLLSVGFRMPKHPVTLDMIAN 


119 






+ ++ +L L F PGPLT+I + S+++P + + L +V RMP HPV L +1 




Sb j ct : 


77 


EIAEGYEPHLDFL-KKFWPGPLTVIFRKKSEKIPPWTADLPTVAVRMPAHPVALKLIEL 


135 


Query: 


120 


HG-PLIGPSfiNISGCDS6RVFSEIQRQENHQV-LGIEDDKALTGVDSTIIDLSGDRVKIL 


177 






G P+ PSANISG S + + F +V L 1+ G++STI+DL+ ++ +L 




Sbjct: 


136 


FGHPIAAPSftNISGRPSATNVKHVIEDFMGKVKLIIDAGDTPFGLESTIVDLTKEKPVLL 


195 


Query: 


178 


RQGAITQEVLTATIPELI 195 








R G + EL PEL+ 




Sbjct: 


196 


RPGPVEVERLKELFPELV 213 





A related DNA sequence was identified in S.pyogenes <SEQ ID 6565> which encodes the amino acid 
sequence <SEQ ID 6566>. Analysis of this protein sequence reveals the following: 
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Possible site: 46 

>» Seems to have no N-terroinal signal sequence 

— --- Final Results , . 

bacterial cytoplasm — Certainty=0. 0282 (Affirmative) < suco> 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 127/196 (64%) , Positives = 154/196 (77%) 



Query: 


1 


^l^roLGQILEDHGAVIMPTETVYGIFAKALSEEAVNHVYELKKRPRDKAMNLNICDFETIL 


60 






M L I+E A+++PTETVYG+FAKAL E+AVN VY+LK+RPRDKAMNLN+ DP +IL 




Sbjct: 


11 


mYLASIIESGDALVLPTETWGIiFAKALDEKAVNAVYDLKQRPRDKAMNIj^ 


70 


Query: 


61 


KySKNQPTYLKQL'XDAFLPGPLTIILEASQEVPHWINSGLLSVGFRMPKHPVTIiDMIANH 


120 






+SK QP iniiK+Ly AFLPGPLTIIL+A+ +VP+WINSGL +VGFR+P HP+T +1 




Sbjct: 


71 


AFSKEQPRYLKKLyQAFLPGPLTIILKANDQVPYWINSGLSTVGFRLPSHPITAJiLIQKr 


130 


Query: 


121 


GPLIGPSANISGCDSGRVFSEIQKQFNHQVLGIEDDKALTGVDSTIIDLSGDRVKILRQG 


180 






GPIiXGPSAN+SG SGRVF I + F+ QV G DD LTG DSTI+DLSG+R IIiRQG 




Sbjct: 


131 


GPLIGPSANLSGKASGRVFDHIMQDEDFQVFGYADDPFLTGKDSTILDLSGERAVILRQG 


190 


Query: 


181 


AITQEVLTATIPELIF 196 








AIT+E L A +PEL F 




Sbjct: 


191 


AlTKEELLANVPEIiRF 206 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2127 

A DNA sequence (GBSx2243) was identified in S.agalactiae <SEQ ID 6567> which encodes the amino 
acid sequence <SEQ ID 6568>. This protein is predicted to be protoporphyrinogen oxidase (hemK). 
Analysis of this protein sequence reveals the following: 

Possible site: 50 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP;BftB07493 GB:AP001519 protoporphyrinogen oxidase [Bacillus halodurans] 
Identities = 94/236 (39%) , Positives = 132/236 (55%) , Gaps = 12/236 (5%) 

Query: 49 DTDQQLMENIFQQDKKHRSP---QYITGKAYFRDLIFFVDERVLIPRPETEELVDLILSE 105 

+ D+L + + + L HS Q++ OFF VD+ VLIPRPETEELV +L E 
Sbjct: 46 ELDGELFQRLEEDLAAHASGVPVQHLIGVESFYGRQFQVDQHVLIPRPETEELVLAVLKE 105 

Query: 106 NKVEDCSVLDIGTGSGAmiSLKKERPSWDVLASDISVSALDLAKENANNCDAEV 160 

K E+ ++LDI6TGSGAIA++L E +V A DIS AL +A +NA A V 
Sbjct: 106 IRRQFKKEEEITILDlGTGSGAmVTLALEEERTNVTAVDISRDAiiQVAADliaRRLGANV 165 

(3uery: 161 TFIESDV FSNISGKFDIIVSNPPYISYNDKDEVGKNVLASEPHSALFADEEGIiAIYR 217 

I D+ F +FD+IVSNPPYI +KD + +V EP ALF +GL +YR 

Sbjct: 166 QLlHGDU3EPFLKTOERFDVIVSNPPYIPTVEKDTIAVHVRDHEPAIALFGGVDGIiDVYR 225 

Query: 218 KIIENSREYL-QPRGKLYFEIGYKQGDDLRSLLKRYFPNNRCRVLKDircKDRMW 272 

+++ + +G .+ EIG QG D+ L++ +P VL D+ GKDR+V+ 

Sbjct: 226 RLMSQLPALTKEEKGMVALEIGAGQGMDVEKLMQTAYPKAAVDVLYDIiNGKDRIVL 281 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 6569> which encodes the amino acid 
sequence <SEQ ID 6570>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4324 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0. GOOD (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 174/274 (63%) , Positives = 207/274 (75%) 

15 Query: 1 MNYAQIjIKHYGQLLEaCGEEVENFIYVLKDLKQWSTTDYLtNQNSSVSDTDQQLMENIFQ 60 

MNYA LI+ y LE E+ EN YV +++K+WS+ D L++QN +V+ D Ii+E+IF 
Sbjct: 1 IWYATIiIRTYEDKLEQIDEDREtnAYVFREIKEWSSLDMLIHQNQAVTPEDAVLLEHIFC 60 

Query: 61 QLKKHRSPQYITGKAYFRDLIFFVDERVLIPRPETEELVDLILSENKVEDCSVLDIGTGS 120 
20 L +H SPQYITG AYFRDL VD+RVDIPRPETEELVD+IL+EN +VLDIGTGS 

Sbjct: 61 SLSQHLSPQYITGNAYFRDLKLAVDKRVLIPRPETEELVDMILAENLDAPLNVLDIGTGS 120 

Query: 121 GAIAISLKKERPSWDVLASDISVSALDLftKENSNNCnaEVTFIESIOT 180 
GAIAISLKKERP+W V ASDIS +AI1DLAK NA+ ++TFIESDVFS IS FDIIVS 
25 Sbjct: 121 GamiSLKKERPtWQVTASDISRAALDLAKANADAYQLDITFIESDVFSLISETFDIIVS 180 

Query: 181 NPPYISYimKDEVGKimASEPHSALFaDEEGLAIYRKIIENSREYLQPRGKLYFEIGYK 240 

NPPYISY DK+EV NVL SEPH ALFA E G AIYRKIIE + YL GKLYFEIGYK 
Sbjct: 181 NPPYISYEDKEEVSrJm,QSEPHLALFAKEiraYAIYRKIIEQADNYLTKEGKLYFEIGYK 240 

30 

Query: 241 QGDDLRSLLKRYFPNNRCRVLKDIFGKDRMWLD 274 

Q + ++ +L+ YFP R + DIFGK+RMW+D 
Sbjct: 241 QAEGIKDMLQAYFPQRHIRAVTDIFGKERMWVD 274 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useftil antigens for 
vaccines or diagnostics. 

Example 2128 

A DNA sequence (GBSx2244) was identified in S.agalactiae <SEQ ID 657 1> which encodes the amino 
acid sequence <SEQ ID 6572>. This protein is predicted to be peptide chain release factor RF-1 (prfA). 
40 Analysis of this protein sequence reveals the following: 

Possible site: 28 

>>> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial ■. cytoplasm — Certainty=0 . 3446 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

50 >GP:CAB15718 GB:Z99122 peptide chain release factor 1 [Bacillus siabtilis] 

Identities = 211/351 (60%) , Positives = 280/351 (79%) , Gaps = 1/351 (0%) 

Query: 5 DQLQAVEDRYEELGELLSDPDWSDTKRFMELSREEASTRETVTAYREYKQVIQNISDAE 64 
D+L+++E+RYE+L ELLSDP+W+D K+ E S+E++ +ETV YR+Y+ + ++DA+ 
55 sbjct: 3 DRIJCSIEERYEKIjNELLSDPEVVNDPKKtiREYSKEQSDIQETVDVYRQYRDASEQLADAK 62 



Query: 65 EMIKDASGDAELEEMAKEELKESKAAKEEYEERLKILLLPKDPNDDKNIILEIRGAAGGD 124 
M+++ DAE+ +M KEE+ E + E ERLK+LL+PKDPKDDKN+I+EIRGAAGG+ 
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Sbjct: 63 AMLEEKL-DAEMRDMVKEEISELQKETETLSERLK^MiIPKDPNDDKNVIMEIRGRAGGE 121 

Query: 125 EARLFAGDLLTMYQKyAETQGWRFEVMESSWGVGGIKEWAMVSGQSVYSKLKYESGftH 184 

EAALFAG+L MY +YAE QGW+ EVME++V G GG KE++ M++G YSKLKYE+GAH 
Sbjct: 122 EaALFAGNLYRMYSRYiffiLQGWKTEVMEUWmXSTGGYKEIIFMITGSGAYSK^ 181 

Query: 185 RVQRVPVTESQGRVHTSTATVLVMPEVEEVEYEIDQKDLRVDIYHASGAGGQNV^^ 244 

RVQRVP TES GR+HTSTATV +PE EEVE +1 +KD+RVD + +SG GGQ+VN +A 
Sbjct: 182 RVQRVPETESGGRIHTSTATVAGLPEAEEVEVDIHEKDIRVDTPASSGPGGQSVNTTMSA 241 

Query: 245 VRMVHIPTGIKVEMQEERTQQKHRDKAMKIIRARVADHFAQIAQDEQDAERKSTVGTGDR 304 

VR+ H+PTG+ V Q+E++Q KN++KftMK++RAR+ D F Q AQ E D RKS VG+GDR 
Sbjct: 242 VRLTHLPTGVWSCQDEKSQIKNKEKAMKVLRARIYDKFQQEAQaEYDQTRKSAVGSGDR 301 

15 Query: 305 SERIRTYNFPQNRVTDHRIGLTLQKLDTILSGKMDEVIDALVMYDQTQKIiE 355 

SERIRTYNFPQNRVTDHRIGLT+QKLD IL GK+DEV++AL++ DQ KL+ 
Sbjct: 302 SERIRTYNFPQNRVTDHRIGLTIQKLDQILEGKLDEWEALIVEDQASKLQ 352 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6573> which encodes the amino acid 
20 sequence <SEQ ID 6574>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

»> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm — certainty=0. 3446 (Affirmative) < suco 

bacterial meinbrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty^O. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

30 Identities = 349/358 (97%), Positives = 354/358 (9B%) 

Query: 1 MNIYDQLQAVEDRYEELGELLSDPDWSDTKRFMELSREEASTRETVTAYREYKQVIQNI 60 

MNIYDQLQAVEDRYEELGEIiLSDPDWSDTKRPMELSREE +TRETVTAYREYKQVIQ I 
Sbjct: 1 MNIYDQI<3AVEDRYEEICELLSDPDWSDTKRFMELSREEmrRETVTAYREYKQVIQTI 60 

35 

Query: 61 SDAEEMIKDASGDAELEEMAKEEIiKESKAAKEEYEERLKILLLPKDPNDDKNIILEIRGA 120 

SDAEEMIKDASGD ELEEMAKEELKESKAAKEEYEE+LKIIiLLPKDPNDDKNIILEIRGA 
Sbjct: 61 SDAEEMIKDASGDPELEEMAKEELKESKaAKEEYEEKLKILLLPKDPNDDKNIIIiEIRGA 120 

40 Query: 121 AiGGDEAALFAGDLLTMYQKyAETQGWRFEVMESSWGVGGIKEVVaMVSGQSVYSKLKYE 180 

AGGDEAALFAGDLLTMYQKXaETQGWRFEVmSSTOGVGGIKEVVaMVSGQSVySKliKyE 
Sbjct: 121 AGGDEaaLFAGDLLTMYQKYAETQGWRFEVMESSVNGVGGIKEWaMVSGQSVYSKLKyE 180 

Query: 181 SGAHRVQRVPVTESQGRVHTSTATVLVMPEVEEVEYEIDQKDLRVDIYHASGAGGQNVNK 240 
45 SGAHRVQRVPVTESQGRVHTSTATVLVMPEVEETOY+ID KDLRVDIYHASGAGGQNVNK 

Sbjct: 181 SGAHRVQRVPVTESQGRVHTSTATVLVMPEVEEVEYDIDPKDLRVDIYHASGAGGQNVNK 240 

Query: 241 VATAVRMVHIPTGIKVEMQEERTQQKNRDKftMKIIRARVADHFAQIAQDEQDAERKSTVG 300 
VATAVRMVHIPTGIKVEMQEERTQQKNRDKAMKIIRARVADHPAQIAQDEQDAERKSTVG 
50 Sbjct: 241 VATAVRMVHIPTGIKVEMQEERTQQKNRDKAMKIIRARVADHFAQIAQDEQDAERKSTVG 300 

Query: 301 TGDRSERIRTYNFPQNRVTDHRIGLTLQKLDTILSGKMDEVIDALVMYDQTQKLEALN 358 

TGDRSERIRTYNFPQNRVTDHRIGLTLQKLDTILSGKMDEVinaLVMYDQT+KLE+IiN 
Sbjct: 301 TGDRSERIRTYNFPQITOVTDHRIGLTLQKLDTILSGK^roEVIDALVMYDQTKKLESm 358 

55 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 2129 

A DNA sequence (GBSx2245) was identified in S.agalactiae <SEQ ID 6575> which encodes the amino 
acid sequence <SEQ ID 6576>. This protein is predicted to be thymidine kinase (tdk). Analysis of this 
protein sequence reveals the following: 

5 Possible site: 39 

»> Seems to have no N- terminal signal sequence 

Pinal Results 

bacterial cytoplasm — Certaintyi=o. 2244 (Affirmative) < suco 

10 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certaintyi=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9841> which encodes amino acid sequence <SEQ ID 9842> 
was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB02289 GB:L40415 thymidine kinase [Streptococcus gordonii] 
Identities = 158/189 (83%) , Positives = 175/189 (91%) 

Query: 1 MZ^JLYYKYGTMNSGKTIEILro/miYEEQGKPWIMTSALDTRDEFGWSSRIGM^^ 60 
20 MAQLYYKYGTMNSGKTIEILKVAHNYEEQGK WIMTSA+DTRD G VSSRIGM+R+A+ 

Sbjct: 1 MAQLYYKyGTMNSGKTIEIIiKVaHNYEEQGKGWIMTSAVDTRDGVGWSSRra 60 

Query: 61 PISDDMDIFSYIQNLPQKPYCVLIDECQFLSKKNVYDLARVVDDLDVPVMAFGLKNDFQN 120 
I DD DI YI+NLP+KPYC+LIDE QFL + +VYDIARWD+LDVPVMAFGLKNDF+N 
25 Sbjct: 61 AIEDDTDHjGYIKNLPEKPYCILIDEAQFLKRHHVYDLARWDELDVPVMAFGLKNDFRN 120 

Query: 121 NLFEGSKHLLLLADKIDEIKTICQYCSKKATMVLRTENGKPVYEGDQIQIGGNETY 180 

LFEGSKHLLIiADKI+EIKTICQYCS+KATMVLRT++GKPVY+G+QIQIGGNETyiFVC 
Sbjct: 121 ELFEGSkHIiLIirJU)KIEEIKTICQYCSRKRTMVIiRTDHGKPVYDGEQIQIGGNETYIP^^ 180 

30 

Query: 181 RKHYFNPDI 189 

RKHYF PDI 
Sbjct: 181 RKHYFKPDI 189 

35 A related DNA sequence was identified in S. pyogenes <SEQ ID 6577> which encodes the amino acid 
sequence <SEQ ID 6578>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>» Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0. 2244 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clesu:) < suco 

bacterial outside — Certaintyi=0. 0000 (Not Clear) < suco 

45 An alignment of the GAS and GBS proteins is shown below. 

Identities = 174/189 (92%) , Positives = 184/189 (97%) 

Query: 1 MAQLYYKYGTMNSGKTIEILKAfflHNYEEQGKPVVIMTSALDTRDEFGVVSSRIGMRREAV 60 

+AQLYYKYGTMNSGKTIEILKVifflNYEEQGKPVVIMTSALDTRD FG+VSSRIGMRREA+ 
50 Sbjct: 1 lAQLYYKYGTMNSGKTIEILKVAHKYEEQGKPVVIMTSALDTRDGFGIVSSRIGMRREAI 60 

Query: 61 PISDDMDIFSYIQNLPQKPYCVLIDECQFLSKKNVYDLARVVDDLDVPVMAFGLKNDFQN 120 

PIS+DMDIF++I L +KPYCVLIDE QFLSK+NVYDLARWD+L+VPVMAFGLKNDFQN 
Sbjct: 61 PISNDmiFTFIAQLEEKPYCVLIDESQFLSKQNVYDLARVVDEIiim>VMAFGLKISlDFQN 120 



55 



Query: 121 NLFEGSKHLLLLADKIDEIKTICQYCSKKRTMVLRTENGKPVYEGDQIQIGGNETYIPVC 180 

NLFEGSKHLLLLADKIDEIKTICQYCSKKATMVLRTENGKPVYEGDQIQIGGNETYIPVC 
Sbjct: 121 NLFEGSKHLLLLADKIDEIKTICQYCSKKRTMVXiRTENGKPVYEGDQIQIGGNETYIPVC 180 
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. Query: 181 RKHYFNPDI 189 

RKHYFNPDI 
Sbjct: 181 RKHYFNPDI 189 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2130 

A DNA sequence (GBSx2246) was identified in S.agalactiae <SEQ ID 6579> which encodes the amino 
acid sequence <SEQ ID 6580>. Analysis of this protein sequence reveals the following: 

10 Possible site: 34 

>» Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0. 3995 (Affirmative) < suco 

15 bacterial membrane — Certainty^O. 0000 (Not Clear) < succ> 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA2604e GB:M95650 4-oxalocrotonate tautomerase [Plasmid pWWO] 
20 Identities = 27/60 (45%) , Positives = 36/60 (60%) 

Query: 1 MPFVKIDLFEGRSQEQKNELAREVTEWSRIAKAPKENIHVFINDMPEGTYYPQGELKKK 60 

MP +1 + EGRS EQK L REV+E +SR AP ++ V I +M +G + GEL K 
Sbjct: 1 MPIAQlHILEGRSDEQKETIilREVSEAISRSLDAPLTSVRVIITEMRKHHFQIGGEIiASK 60, 

25 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6581> which encodes the amino acid 
sequence <SEQ ID 6582>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

»> Seems to have no N-teiminal signal sequence 

30 

Final Results 

bacterial cytoplasm Certainty=0 .4128 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

35 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 56/60 (93%) , Positives = 59/60 (98%) 

Query: 1 MPFVKIDLFEGRSQEQK^ffiLRREVTEVVSRIAKAPKENIHVFIlSIDMPEGTYYPQGELKKK 60 
40 MPFV IDLFEGRSQEQKN+LRREVTEWSRIAKAPKENIHVFINDMPEGTYYPQGE+K+K 

Sbjct: 1 MPFVTIDLFEGRSQEQKNQIJUIEVTEVVSRIAKAPKENIIIVPINDMPEGTYYPQGEMKQK 60 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

45 Example 2131 

A DNA sequence (GBSx2247) was identified in S.agalactiae <SEQ ID 6i583> which encodes the amiao 
acid sequence <SEQ ID 6584>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

»> Seems to have no N-terminal signal sequence 

50 

Final Results 

bacterial cytoplasm Certainty=0 . 2154 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9843> which encodes amino acid sequence <SEQ ID 9844> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

5 >GP:AAC65759 GB:AE001250 conserved hypothetical protein [Treponema 

pallidum] 

Identities = 103/317 (32%) , Positives = 163/317 (50%) , Gaps = 15/317 (4%) 

Query: 7 QLSHSLRLMGTTIDIQINSKNAQKQIR EVIELLELYKNRPSANDHtlSELMAIlinSlNa 62 

10 + S + ++GT +++ SK ++ EV LL+ + SSN +S L A+N A 

Sbjct: 31 EYSRAELVIGTLOlVRVYSKRPAaEVHAALEEVFTLLQQQEMVLSAinaJDSALftaiaB^ 90 

Query: 63 GIKPIQVHPDLFELITIGKEHSLARPSNLNIAIGPLVQTWRIGFSDAKLPSPSEISEAMI 122 
G P+ V L+ L+ + N A+G V+ W IGF A +P P + EA+ 

15 Sbjct: 91 GSAPVVVDRSLYALLERAIiPFAEKSGGAENPAIX33iXVKLWNIGFDRAAVPDPDALKEALT 150 

Query: 123 LSDPTHILLDSN KQSVFMIQIGMKIDLGALAKGYIADKIMTYLKNEMIDSAIIHL 177 

D + L + +V L Q GM++DLGA+AKG++ADKI+ Ii +DSA+++L 

Sbjct: 151 RCDFRQVHLRAGVSVGAPHTVQIiAQAGMQLDLGAIAKGFLADKIVQLLTAHALDSALVDL 210 

20 

Query: 178 GGNV LVHGDNPNRSEGY--WVIGIQHPKKKRGKNIGTVKIKNQSWTSGTYERRLI 231 

GGN+ h +GD + + W +GI+ P K V +++ SWTSG YER 

Sbjct: 211 GGNIFALGLKYGDVRSAAAQRLEWNVGIRDPHGTGQKPALWSVRDCSWTSGAYERFFE 270 

25 Query: 232 IDDKEYHHIFDRQTGYPIQTEMASISIVSKQSVDCEimTRLFGLSIKEaiiDinNAVSYI 291 

D YHHI D TG+P T++ S+SI + +S D + T F L +++ +L + 
Sbjct: 271 RDGVRYHHIIDPVTGFPAHTDVDSVSIFAPRSTnADALATACFVLGYEKSCMiLREFPGV 330 

Query: 292 EGIIITKDDRIYLSDGL 308 
30 + + I D R+ S G+ 

Sbjct: 331 DALFIFPDKRVRASAGI 34?' 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6585> which encodes the amino acid 
sequence <SEQ ID 6586>. Analysis of this protein sequence reveals the following: 

35 Possible site: 52 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 1020 (Affirmative) < suco 
40 bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 182/310 (58%) , Positives = 232/310 (74%) 

Query: 8 LSHSLRLMGTTIDIQINSKNAQRQIREVIELLELYKNRFSANDFNSELMAINNNAGIKPI 67 

++ L+LMGT IDIQI S A -K^t VI+LL YKNRFSAND NSEIMAIN AG+KP+ 
Sbjct: 3 VTQQLKLMGTVIDIQIESDKACQQLSRVIDIiLYTYKNRFSANDSNSELMAINQAAGVKPV 62 

50 Query: 68. QVHPDLFELITIGKEHSLftRPSNtNIAIGPLVQTWRIGFSDAKLPSPSEISEBMILSDPT 127 

VH DLF LI IGK HSL+ PSNLNIAIGPLVQ WRIGF DA+H-PS + IS+ + L+DP 
Sbjct: 63 SVHSDLF^^:lIQIGKAHSLSTPSNIlNIAIGPLVQAWRIGFEnARVPSHNLISQQLALTDPR 122 

Query: 128 HILLDSNKQSVFLNQIGMKIDLGALAKGYIADKIMTYLKNEMIDSAIINLGGNVLVHGDN 187 
55 +L+D KQ+VFL Q+GM +DLGALAKGYI DKIM YL + IDSA+INLGGNV VHG N 

Sbjct: 123 QVLIDDKKQTWLQQVGMAIDLGAIAKGYITDKIMAYLIEDGIDSALINLGGNVRVHG^^ 182 

Query: 188 PNRSEGYWVIGIQHPKKKRGKNIGTVKIKNQSVVTSGTYERRLIIDDKEYHHIPDRQTGY 247 
P + + IGIQ P KRG+++G +K+ N SWTSG YER+ K+YHHI DRQTGY 

60 Sbjct: 183 PKSPDKTFRIGIQKPDAKRGQHUSVIKWmHSVVTSGIYERQFTSKGKQYHHILDRarGY 242 

Query: 248 PIQTEMASISIVSKQSTOCEIWTTRLFGLSIKEALDILNAVSYIEGIIITKDDRIYLSDG 307 



45 
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PI+T+M S++I++ S C+IWTTRLFGIi + +LN IEG+++T+ + +S+G 
Sbjct: 243 PIETDMLSLTIMaPSSPYCDIWTTRLFGIJDSSMIITIiIiIWFDNIEGLLVTRKHHVLMSNG 302 

Query: 308 LKHHFQLFYH 317 

L+H+FQ +YH 
Sbjct: 303 LRHYFQPYYH 312 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2132 

A DNA sequence (GBSx2248) was identified in S.agalactiae <SEQ ID 6587> which encodes the amino 
acid sequence <SEQ ID 6588>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certaintyi=0. 0966 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AftG18632 GB:AY007504 unknown [Streptococcus mitis] 
Identities = 92/160 (57%) , Positives = 119/160 (73%) , Gaps = 1/160 (0%) 

Query: 1 MKLIGIVGTNSNKSTNRQLLQYMQQHFADKAEIELIEVKDLPLFNKPADKNVPQVILDIA 60 

MKL+ IVG'mSN+STNR+LL++MQ+HF+DKA+IE++E+K LP EN+P D+ P + + 
Sbjct: 1 MKLVAIVGTNSNRSTNRKLLKFMQKHFSDKADIEVLEIKQLPAFISIEPEDEQAPAEVQAFS 60 

Query: 61 AKIEETDGVIIGTPEYDHSIPSAmSVLAWLSYGIYPLLNKPVMITGASYGTLGSSRAQL 120 

KI DGVII TPEYDH+IP+ L S L W++Y L+NKP MI GAS G LG+SRAQ 
Sbjct: 61 EKILAflDGVIISTPEYDHTIPAPLASALEWIAYTSRALINKPTMIVGASLGLLGTSRAQA 120 

Query: 121 QLRQIMJAPELKASVLP-DEFLLSHSLQAFDKDGNLHDIE 159 

LRQIL+APELKA V+P EF L HS Q D + +L++ E 
Sbjct: 121 HLRQILDAPELKARVMPGTEFFLGHSEQVLDDECHtNNPE 160 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6589> which encodes the amino acid 
sequence <SEQ ID 6590>. Analysis of this protein sequence reveals the following: 

Possible site: 24 
>» Seems to have an uncleavable N-term signal seq 



Final Results 

bacterial membrane Certaintys^O. 0000 (Not Clear) < suco 

bacterial outside — Certaintyi=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB62679 GB:AL133422 putative secreted protein. [Streptonyces 
coelicolor A3 (2)] 

Identities = 68/192 (35%) , Positives = 94/192 (48%) , Gaps = 25/192 (13%) 

Query: 4 ILFIVGSLREGSFiraQLAAQAQK-ALEHQAVVSYIJinjKDVPVLNQDIEANAPLPVVDA-- 60 

IL +VGSI.R GS N QIiA A + A E V + ++P N+DI+ +P A 
Sbjct: 5 ILALVGSLRAGSHNRQLAEAAVRFAPEGAEVQLFEGLAEIPFYNEDIDVEGSVPAAAAKL 64 

Query: 61 RQAVQSADAIWIFTPVYNPSIPGSVKNLLDWLSRAmLSDPTGPSAIGGKWTVSSVftNG 120 

R+A Q A A +F+P YN +IP +KN +DWLSR PGAGKVV AG 

Sbjct: 65 REAAQGAQAFLLFSPEYNGTIPAVLKNAIDWLSR PYGAGAFTGKPVAWGTAFG 118 
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Query: 121 GHDQVFDQFKft. LI>PFIRTSVft3EFTK-ATVNP--DAWGTGRLEISKETKA 167 

+ V+ Q +A ++ 1+ S+ G T+ A +P DA . +L E A 

Sbjct: 119 QYGGVWAQDEMlKAVGIAGGKVIEDIKLSIPGSVTREaETHPADDAEVaAQL---TEVVA 175 

5 

Query: 168 NLLSQAEALIAA 179 

L A+ +AA 
Sbjct: 176 RLHGHADEAIAA 187 

10 An alignment of the GAS and GBS proteins is shown below. 

Identities = 28/90 (31%) , Positives = 49/90 (54%) 

Query: 3 LIGIVGTNSNKSTNRQLLQYMQQHFADKAEIELIEVKDLPLFNKPADKNVPQVILDIAAK 62 
++ IVG+ S N QL Q+ +A + + KD+P+ N+ + N P ++D 
15 Sbjct: 4 ILFIVGSIJlE»SEISIHQIJUiQAQKaiiEHQAWSY]aWKI)VPVIJSrQDIEaN^ 63 

Query: 63 lEETDGVIIGTPEYDHSIPSALMSVLAWLS 92 

++ D + I TP Y+ SIP ++ ++L WLS 
Sbjct: 64 VQSADAIWIFTPVYNFSIPGSVKNLLDWLS 93 

20 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2133 

A DNA sequence (GBSx2249) was identified in S.agalactiae <SEQ ID 6591> which encodes the amino 
25 acid sequence <SEQ ID 6592>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=C. 1160 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

35 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2134 

A DNA sequence (GBSx2250) was identified in S.agalactiae <SEQ ID 6593> which encodes the amino 
40 acid sequence <SEQ ID 6594>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0 . 2132 (Affirmative) < suco 

bacterial membrane Certaintyi=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

50 >GP:AAG18632 GB:AY007504 iinknown [Streptococcus mitis] 

Identities = 80/162 (49%) , Positives = 112/162 (68%) 

Query: 1 MKFVGIVGSNftEQSYNRMLLEFIRKNFKTKFELEVLEIDDIPMENQDQNWEESFQLRLLN 60 
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MK V IVG+N+ +S NR LL+F++K+F K ++EVLEI +P FN+ ++ + +++ + 
Sbjct: 1 MKLVAIVGTNSmSOTOKIiLKFMQKHFSDKftDIEVLEIKQLPAFNEPEDEQAPAEVQAFS 60 

Query: 61 NKlTRADGVIIATPEHNHTITAALKSVLEWtSFAVHPLENKPVMIVGASYYDQGTSRAQI 120 
5 KI ADGVII+TPE++HTI'A L S LEW+++ L NKP MIVGAS GTSRAQ 

Sbjct: 61 EKILaaDGVIISTPEYDHTIPAPLASALEWIAYTSRMiINKPTMIVGASLGLLGTSRAQA 120 

Query: 121 HLRKILDAPGVNAYTLPGNEFIiLGKAKEAFDDNGNIlNPGTV 162 
HLR+ILDAP + A +PG EF LG +++ DD ++ NP V 
10 Sbjct: 121 HLRQILDAPELKARVMPGTEFFIfiHSEQArtiDDBCHLNNPEKV 162 

There is also homology to SEQ ID 6596. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

15 Example 2135 

A DNA sequence (GBSx2251) was identified in S.agalactiae <SEQ ID 6597> which encodes the amino 
acid sequence <SEQ ID 6598>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>» Seems to have no N- terminal signal sequence 
20 INTEGRAL Likelihood = -7.32 Transmembrane 13 - 29 ( 11 - 29) 

Final Results 

bacterial membrane — Certainty=0. 3 930 (Affirmative) < suco 

bacterial outside Certainty=0. GOOD (Not Clear) < suco 

25 bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
30 vaccines or diagnostics. 

Example 2136 

A DNA sequence (GBSx2252) was identified in S.agalactiae <SEQ ID 6599> which encodes the amino 
acid sequence <SEQ ID 6600>. This protein is predicted to be potential nitrite transporter. Analysis of this 
protein sequence reveals the following: 

35 Possible site: 42 

»> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood = 


-9. 


,92 


Transmembrane 


61 


- 77 


( 54 


- 82) 


INTEGRAL 


Likelihood = 


-5, 


,57 


Transmembrane 


106 


- 122 


{ 103 


- 126) 


INTEGRAL 


Likelihood = 


-5. 


,15 


Transmembrane 


160 


- 176 


{ 159 


- 177) 


INTEGRAL 


Likelihood = 


-4. 


.09 


Transmembrane 


180 


- 196 


( 179 


- 199) 


INTEGRAL 


Likelihood = 


-1. 


,01 


Transmembrane 


233 


- 249 


( 233 


- 249) 



Final Results 

bacterial membrane Certainty=0 .4970 (Affirmative) < suco 

45 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15832 GB:Z99123 alternate gene name: ipa-48r~similar to 
50 nitrite transporter [Bacillus subtilis] 

Identities = 82/253 (32%) , Positives = 119/253 (46%) , Gaps = 10/253 (3%) 



Query: 6 EKIA^CAKKEALYKESLGRYALRSMLAGAYLTMSTAAGIVAADTIGK-ISPALSGFVF- 63 
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+K+ KK+ ++ S RY LRS+LA ++ GI AA G A S P F 

Sbjct: 7 QKVEQYALKKQNIFASSKIRYVLRSILASIFIGF GITAASKTGSYFFMftDSPFAFP 62 

Query: 64 --AFIFSFGLIYVLIFNGELATSNMLYLTAGAYNKMISWKKRMTILIYCTFFNLVGACIL 121 

A F ++ + G+L T N Y T A K ISW+ + + + NL+GA + 

Sbjct: 63 ARAOTFGAAIMIAYGGGDLFTGNTB^FTYTALRKKISWRDTLYLWMSSYAGNLIGAILP 122 

Query: 122 AWLFNQSYSFQHLTNDSFLGHVVAKKLGKPSSGAFLEGIIAIMFVNLAILAYMLLKEESA 181 

A L + + F+ + SFL H+ K+ P+S F G++ N V LA M LK E A 
Sbjct: 123 AILISATGLFEEPSVHSFLIHLAEHKMEPPASELFFRGMLCNWLVCIiAFFIPMSLKGEGA 182 



Query: 182 KMWILSAIFMFVFLSNEHLIANFASFMIlaAFSHIEHIKGFTLmIIRQWTLVFFG^lWIG 241 

K+ ++ +F F EH IAN +F ++ lEH TL+ +R V GN 

Sbjct: 183 KLFTMMLFVFCFFISGFEHSIAHMCTFAISIiL--IEHPDTVTIiMGAVRNLIPVTLGl^ 240 

Query: 242 GGVFIGLAYAWLN 254 

G V +G Y ms 

Sbjct: 241 GIVMMGWMYYTLN 253 

A related DNA sequence was identified in S. pyogenes <SEQ ID 660 1> which encodes the amino acid 
sequence <SEQ ID 6602>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane -- 
bacterial outside -- 
bacterial cytoplasm -■ 



- Certainty=0 . 4906 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:AflB80864 GB:U93874 formate dehydrogenase [Bacillus subtilis] 
Identities = 133/258 (51%) , Positives = 181/258 (69%) 

Query: 36 KTPEQILEATIHIGEHKVTKTFLftKSILGFIGGAMISLGYLLYVRIAASGLETFGAFSSI 95 

+ P++I EA I G K+ + +LGF+GGA I+LGYLL +R+ + +G+ SS+ 

Sbjct: 4 RKPDEIAEAAIEAGMKKIKLPLPSLLVLGFLGGAFIALGYLLDIRVIGDLPKEWGSLSSL 63 

Query: 96 VGACAFPIGLIIILMAGGELITGNMMAVSAALLAKKIKFSELAKNWLIITLFNVIGAVFV 155 

+GA FP+GLI++++AG ELITGNMM+V+ AL ++KI ELA NW I+T+ N+IGA+FV 
Sbjct: 64 IGAAWPVGLILVVLAGSffiLITGNmSVaMALFSRKISVKELAINWGIVTIMNLlGALPV 123 

Query: 156 AFVFGHFLGLTSAGIFKEEVIEVAHAKIAASPLQRLVSGIGCNWFVGLALWLCYGftNnAA 215 

A+ FGH +6LT G + E+ I VA K+ S + L+S IGCNW V LA+WL +GA DAA 
Sbjct: 124 AYFFGHLVGLTETGPVLEKTIAVAQGKLDMSFGKVLISAIGCNWLVCLAVWLSFGAQDAA 183 

Query: 216 GKFLGTWFPYMTFVALGFQHSVANAFVIPAAIFEGGATWLDFVTNFIFVYSGNIIGGAIF 275 

GK LG WFP+M FVA+GFQH VAN FVIPAAIF G TW F+ N I + GN+IGGA+F 
Sbjct: 184 GKILGIWFPIMAFVAIGFQHWANMFVIPAAIFAGSFTWGQFIGNIIPAFIGNVIGGAVF 243 

Query: 276 VSFLYFKVYYHPQKSKTQ 293 

V +YF Y+ +S+ + 
Sbjct: 244 VGLIYFIAYHKKDRSRKE 261 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 69/240 (28%) , Positives = 101/240 (41%) , Gaps = 18/240 (7%) 

Query: 15 KEALYKESLGRYALRSMLAGftYLTMSTAAGIVAADTIGKISPALSGPVFAFIFSFGLIYV 74 

KLKLG +GL+AA +TG ASVAF GLI + 
Sbjct: 55 KTFLAKSILGFIGGAMISLGYLLYWIAAS--GLETFG AFSSIVGACAFPIGLIII 108 
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Query: 75 LIFNGELATSNMLYLTAGAYNKNISWKKiWITILIYCTFFNLVGACILAWL-FNQSYSFQHL 134 

L+ GEL T NM+ ++A K I + + + T FN++GA +A++F F h 

Sbjct: 109 LMRC3GELITGNM^mVSJU^IAKKIKFSEIJ«Q^WLIITIlFmrIGRV^ 165 

5 

Query: 135 TNDSFLGHWAK KLGKPSSGAFLEGIIANMFVNLAILAYMLLKEESAKMWI 190 

T+ V + K+ A + GI N FV LA+ + + K + 

Sbjct: 166 TSAGIFKEEVIEVAHAKIAASPLQALVSGIGCNWFVGLALWLCYGANDAAGKFLGTWFPV 225 

10 Query: 191 FMFVFLSNEHLIANFASFMLAAFSHIEHIKGFTLLNIIRQWTLVFFGNWIGGGVFIGLAY 250 
FV L +H +AN A F G T L+ + + V+ GN IGG +F+ Y 

Sbjct: 226 MTFVALGFQHSVaNAPVIPAAIFE GGATWLDFVTNFIFVYSGNIIGGAIFVSFLY 280 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
15 vaccines or diagnostics. 

Example 2137 

A DNA sequence (GBSx2253) was identified in S.agalactiae <SEQ ID 6603> which encodes the amino 
acid sequence <SEQ ID 6604>. Analysis of this protein sequence reveals the following: 

Possible site: 56 
20 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1342 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GEMPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
30 vaccines or diagnostics. 

Example 2138 

A DNA sequence (GBSx2254) was identified in S.agalactiae <SEQ ID 6605> which encodes the amino 
acid sequence <SEQ ID 6606>. Analysis of this protein sequence reveals the following: 

Possible site: 50 
35 >» Seems to have no N-terminal signal sequence 

XNTEGR!^ Likelihood = -0.22 Transmembrane 44 - 60 ( 44 - 60) 

Final Results 

bacterial membrane Certainty=0 . 1086 (Affirmative) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

45 Based on this analysis, it was predicted that this protein and its epitopes, could be useflil antigens for 
vaccines or diagnostics. 



wo 02/34771 



PCT/GBOl/04789 



-2407- 

Example 2139 

A DNA sequence (GBSx2255) was identified in S.agalactiae <SEQ ID 6607> which encodes the amino 
acid sequence <SEQ ID 6608>. This protein is predicted to be xanthine permease (pbuX). Analysis of this 
protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N-teiminal signal sequence 
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Final Results 

bacterial membrane Certainty=0 . 4163 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14123 GB:Z99115 xanthine permease [Bacillus siibtilis] 
Identities = 213/412 (51%) , Positives = 292/412 (70%) , Gaps = 5/412 (1%) 

Query: 14 LGLQHLLftMXAGSILVPIMIASALGYNaKQLTZLIATDIFMCGlATLLQLRLSKHFGVGL 73 

LG+OH+LaMyAG+I+VP+++ A+G +QLTyL++ DIFMCG+ATLLQ+ ++ FG+GL 
Sbjct: 11 LGIQHVIAMyAGAIVVPLIVGKAMGLTVEQLTYLVSIDIEMCGVATLLQVWSNRFFGIGL 70 

C3uery: 74 PWLGCAFQSVAPLSIIGAQQGSGYMFGALIASGIYWLVAGIFSKVANFFPPIVTGSVI 133 

PWLGC F +V+P+ IG++ G ++G++IASGI V+L++ F K+ +FFPP+VTGSV+ 
Sbjct: 71 PWLGCTFTAVSPMIAIGSEYGVSTVYGSIIASGILVILISFFPGKLVSFFPPVVTGSW 130 

Query: 134 TTIGLTLIPVftMGNMGD- - -NAKEPSLQSLTLSLVTIGWLLINIFAEGFLKSISILIGL 190 

T IG+TL+PVRM NM +A L +L L+ + +++L+ F RGF+KS+SILIG+ 

Sbjct: 131 TIIGITLMPVAMNNMAGGEGSaDFGDLSNLALAFTVLSIIVLLYRFTKGFIKSVSILIGI 190 

Query: 191 ISGTILAAFMGLVDASWADAPLVHIPKPFYFGAPRFEFTSILMMCIIATVSMVESTGVY 250 

+ GT +A FMG V V+DA +V + +PFyFGAP F 1+ M I+A VS+VESTGVY 
Sbjct: 191 LIGTFIAYFMGKVQFDNVSDAAWQMIQPFYFGAPSFHftAPIITMSIVAIVSLVESTGVY 250 

Query: 251 IJttiSDITiroKLDSKRLRNGYRSEGLAVLLGGLFNTFPYTGFSQNVGLVQISGIRTRKPIY 310 

AL D+TN +L L GYR+EGLAVLLGG+FN FPYT PSQNVGLVQ++GI+ I 
Sbjct: 251 FALGDLTNRRLTEIDLSKGYRAEGLAVLLGGIFNRFPYTAFSGl!lW3LVQLTGIKiOlft.VIV 310 

Query: 311 FTALFLVILGLLPKFGAMAQMIPSPVLGGAMLVLFGMVALQGMKMLNQVDFEHNEHNFII 370 

T + L+ GL PK A +IPS VLGGAM+ +FGMV G+KML+++DF E N +1 
Sbjct: 311 VTGVILMAFGLFPKIAAFTTIIPSAVLGGArTOMFGM\nAYQIKMLSRIDFAKQE-NIiLI 369 

Query: 371 AAVSIAAGVGFNGT-NLFISLPHTLQMFLTNGIVISTLTAVVLNIILNGLPK 421 

A S+ G+G ++F LP+ L + TMGIV + TAWI1NI+ N K 

Sbjct: 370 VACSVGLGLGVTWPDIFKQLPSALTLLTTNGIVAGSFTAWLNIVYNVFSK 421 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6609> which encodes the amino acid 
sequence <SEQ ID 6610>. Analysis of this protem sequence reveals the following: 

Possible site: 29 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.32 Transmembrane 160 - 176 ( 158 - 181) 
INTEGRAL Likelihood = -6.37 Transmembrane 103 - 119 ( 98 - 124) 
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10 Final Results 

bacterial membrane Certainty=0 .3930 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty^O. 0000 (Not Clear) < suco 



15 The protein has homology with the following sequences in the databases: 

>GP:CAB15234 GB:Z99120 similar to purine permease [Bacillus subtilis] 
Identities = 216/421 (51%) , Positives = 302/421 (71%) , Gaps = 5/421 (1%) 

Query: 6 KQEHSHSQSAVLGLGHVLSMYAGSILVPIMIAGALGYSaRELTYLISTDIFMCGVATFLQ 65 
20 K++H+ Q +LGLQH+L+MYAG+ILVP+++ A+G +A +LTYLI+ D+FMCG AT LQ 

Sbjct: 2 KEQHNALQIMMLGLQHMLAMmGAILVPLIVGAAIGIJOiGQLTyLIAIDLFMCGaATLLQ 61 

Query: 66 LKLTKHTGVGLPVVLGCAFQSVAPLSIIGAQQGSC3AMFGALIASGIYVILVAGIFSKIAR 125 
L ++ G+GLPWLGC F +V P+ IG+ G A++GA+IA+G+ V+L AG F K+ R 
25 Sbjct: 62 LWRNRYFGIGLPWLGCTFTAVGPMISIGSTYGVPAIYGAIIAflfiLIWLaAGFFGKLVR 121 

Query: 126 FFPPIVTGSVITVIGLSLVGVAM6NM--GnNVKE-PTAQSMMLSLLTIVIILLVQKFTRG 182 

FFPP+VTGSV+ +IG+SL+ AM N+ G+ KE + +++L ILL+ P KG 

Sbjct: 122 FFPPVVTGSVVMIIGISLIPTAMNNLAGGEGSKEFGSLDNVLLGFGVTAFILLLFYFFKG 181 

30 

Query: 183 FVKSISILIGLVAGTLVSAMMGLVDTTPWEASWIHVPTPFYFGMPTFEITSIVMMCIIA 242 

F++SI+IL+GL+AGT + MG VD + V+EASW+HVP+ FYFG PTFE+ ++V M ++A 
Sbjct: 182 FIRSIA1LI/3LIAGTAAAYFMGKVDFSEVLEASWLHVPSLFYFGPPTFELPAVVTMLLVA 241 

35 Query: 243 TVSMVESTGVYLALSDLTNDQLDEKRLRNGYRSEGIAVFLGGLFNTFPYTGFSQNVGLVQ 302 

VS+VEST6VY AL+D+TN +L EK L GyR+EG+A+ LGGLEN FPYT FSQNVG+VQ 
Sbjct: 242 IVSLVESTGVYFALADITNRRLSEKDLEKGYRAEGLAILLGGLFNAFPYTAFSQNVGIVQ 301 

Query: 303 ISGIKTRRPIYYAAGILWIGLLPKFRAMAQMIPSPVLGGAMLVLFGMVALQGMQMLNRV 362 
40 +S +K+ I ILV IGL+PK A+ +IP+PVLGGAM+V+FGMV G++ML+ V 

Sbjct: 302 LSKMKSVNVIAITGIILVAIGLVPKAAALTTVIPTPVLGGAMIVMFGMVISYGIKMLSSV 361 

Query: 363 DFQKNEYNFIIAAVSISaGLGENGT-NlFASLPETAQMFLTOGIVIATLTSWLNLVLNGK 422 
D ++ N +1 A S+S GLG LF+SL A + +GIVI +LT++ L+ K 

45 Sbjct: 362 DLD-SQGNLLIIASSVSLGLGATTVPALFSSLSGAASVLAGSGIVIGSLTAIALHAFFQTK 421 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 328/416 (78%) , Positives = 380/416 (90%) 

50 Query: 7 SNSQAALLGLQHLLAMYAGSILVPIMIASALGYNAKQLTYLIATDIFMCGIATLLQLRLS 66 

S+SQ+A+LGLQH+L+MYAGSILVPIMIA ALGY+A++LTYLI+TDIFMCG+AT LQL+L+ 
Sbjct: 10 SHSQSAVLGLQUVLSMYAGSILVPIMIAGALGYSARELTYLISTDIFMCGVATFLQLKLT 69 

Query: 67 KHFGVGLPVVLGCAFQSVAPLSIIGAQQGSGYMFGALIASGIYVVLVaGIFSICVANPFPP 126 
55 KH GVGLPWLGCAFQSVAPLSIIGaOQGSG MFGALIASGIYV+LVAGIFSK+A PFPP 

Sbjct: 70 KHTGVGLFWLGCAPQSVAPLSIIGAQQGSGaMFGALIASGIYVILVAGIFSKIARPFPP 129 



Query: 127 IVTGSVITTIGLTLIPVAMGNMGDNAKEPSLQSLTLSLVTIGWLLINIFAKGFLKSISI 186 
IVTGSVIT IGL+L+ VAMGNMGDN KEP+ QS+ LSL+TI ++LL+ F KGF+KSISI 
60 Sbjct: 130 IVTGSVITVIGLSLVGVRMGNMGDNVKEPTAQSMMLSLLTIVIILLVQKFTKGFVKSISI 189 



65 



Query: 187 LIGLISGTILAAEM3LViaSVV7U3APLVHIPKPPYPGAPRFEFTSIU4MCIIATVSMVES 246 

LIGL++GT+++A MGLVD + V +A +H+P PPYF6 P FE TSI+MMCIIATVSMVES 
Sbjct: 190 LIGLVAGTLVSAmGLVDTTPVVEASWIHVPTPPYFGMPTFEITSI^MCIIATVSMVES 249 

Query: 247 TGVYLALSDITNDKLDSKRLRN6YRSEGLAVLLGGLENTFPYTGFSQNVGLVQISGIRTR 306 
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TGVYLALSD+TND+LD KRLRNGYRSEG+AV LGGLENTFPYTGFSQNVGLVQISGI+TR 
Sbjct: 250 TGVYIALSDLTNDQLDEKRLRNGYRSEGIAWLGGLFm'FPYTGFSQSlVGLVQISGIK^ 309 

Query: 307 KPIYFTALFLVILGIJ^PKFGJmQMIPSPVIiGGRMLVIiFGNIVaiaSMK^^ 366 

+Piyf A LV++GLI1PKF AMaQMIPSPVLGGaMLVLFGMVALQGM+imN+VDF+ NE+ 
Sbjct: 310 RPIYYAAGILWIGLLPKFRAMAQMIPSPVI.GGRMLVLPGMVaLQGMQrc™.VDFQKNEy 369 

Query: 367 NFIIAAVSIAAGVGFNGTNLFISLPNTLQMFLTNGIVISTLTAWIiNIILNGLPKK 422 

NFIIAAVSI+AG+GFNGTNLF SLP T QMFLTNGIVI+TLT+WLN++IiNG K+ 
Sbjct: 370 NFIIAAVSISAGLGFNGTNLFASLPETAQMFLTNGIVIATLTSVVLNLVIiNGKDKQ 425 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2140 

A DNA sequence (GBSx2256) was identified in S.agalactiae <SEQ ID 6611> which encodes the amino 
acid sequence <SEQ ID 6612>. This protein is predicted to be xanthine phosphoribosyltransferase (xpt). 
Analysis of this protein sequence reveals the following: 

Possible site: 43 

»> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1921 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CaA13587 GB:AJ233894 xanthine phosphoribosyltransferase 
[Streptococcus pneumoniae] 
Identities = 133/162 (82%) , Positives = 144/162 (88%) 

Query: 16 GEWILKVDSFLTHQVDFELMQEIGKVFADKXKERGITKVVTIEASGIAPAVYAAQALGVP 75 

G+NILKVDSFLTHQVDF LM+EIGKVFA+K+ AGITKWTIEASGIAPA++ A+AL VP 
Sbjct: 1 GDNILKVDSFLTHQVDFSLMREIGKVFAEKFASAGITKVV^IEASGIAPALFTAEAL^WP 60 

Query: 76 MIFAKKAKNITMTEGILTAEVYSFTKQVTSQVSIVSRFLSNDDTVLIIDDFLANGQAAKG 135 

MIFAKKAKNITM EGILTAEVYSFTKQVTS VSI +FLS +D VLIIDDFLANGQftAKG 
Sbjct: 61 MIFAKKAKNITMNEGILTAEVYSFTKQVTSTOSIAGKFLSPEDKVLIIDDFLftNGQftAKG 120 

Query: 136 LLEIIGQftGaKVaGIGIVIEKSFQDGRDLIiEKTGVPVTSLAR 177 

L++II QAGA V IGIVIEKSFQDGRDLLEK G PV SLAR 
Sbjct: 121 IiIQIIEQAGATVEAIGIVIEKSFQDGRDLLEKftGYPVLSLAR 162 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6613> which encodes flie amino acid 
sequence <SEQ ID 6614>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 2576 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 156/193 (80%) , Positives = 172/193 (88%) 

Query: 1 MKLI^lERILKDGDVI^EKXLKTOSPLTHQVDFEMQEIGKVFADKyKEAGITKVVTIEAS 60 

M+LLEERIL DG++LGENILKVD+FLTHQVD+ LM+ IGKVFA KY EAGITKWTIEAS 
Sbjct: 1 MQLLEERILTIXSNILGENILKVDNFLTHQVDYRIMKAIGKVFAQKYAEAGITKVVTIEAS 60 
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■Query,: 61 GIAPAVYAAQALGVPMIFAKKAKNITMTEGILTAEVYSFTKQVTSQVSIVSRFLSNDDTV 120 

GIAPAVYAA+A+ VPMIFAKK KNITMTEGILTAEVYSFTKQVTS VSI +FLS +D V 
Sbjct: 61 GIAPAVYAaEAmVPMIPAKKHKNITMTEGILTAEVySFTKQVTSTVSlAGKFLSKEDKV 120 

5 

Query: 121 LIIDDFIiUJGQaAKGLLEIIGQAGAKVAGIGIVIEKSFQDGRDLLEKTGVPVTSIiARIKA 180 

LIIDDFLANGQAARGL+EIIGQAGA+V G+GIVIEKSPQDGR L+E G+ VTSLftRIK 
Sbjct: 121 LlIDDFLANGQAAKGLIEIIGQAGAQWGVGIVIEKSFQDGRRLIEDMGIEVTSLaRIKN 180 

10 Query: 181 FENGRWFAEADA 193 

FENG + F EADA 
Sbjct: 181 FENGNLNFLEADA 193 

Based on this analysis, it was predicted that these proteins and their epitopes could be useftil antigens for 
15 vaccines or diagnostics. 

Example 2141 

A DNA sequence (GBSx2257) was identified in S.agalactiae <SEQ ID 6615> which encodes the amino 
acid sequence <SEQ ID 6616>. Analysis of this protein sequence reveals the following: 

Possible site: 28 
20 »> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2546 (Affirmative) < suco 

bacterial membrane — Certainty^o. 0000 (Not Clear) < suco 
25 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GBNPEPT database. 



30 



35 



45 



50 



>GF:CAB15203 GB:Z99120 similar to GMP reductase [Bacillus subtilis] 
Identities = 243/321 (75%) , Positives = 286/321 (88%) , Gaps = 2/321 (0%) 

VFDYEDIQLIPNKCIISSRSQADTSVKLGNYTFKLPVIPANMQTllDEEVAETLACEGYF 66 
VFDYEDIQLIP KCI++SRS+ DTSV+LG +TFKLPV+PAiqMQTIIDE++A +LA GYF 
VFDyEDIQLIPAKCIVNSRSECDTSVKLGGHTFKLPWPANMQTIIDEKLAISLAENGyF 63 



y+MHRF E R FIK M+ +GL +SISVGVKD EY+FV L E+ PE++TIDIAHGH 



40 SN+VIEMIQH+K+ LP++FVIAGNVGTPEAVRELENAGADATKVGIGPGKVCITK+KTGF 



Query: 


7 


Sbjct: 


4 


Query: 


67 


Sbjct: 


64 


Query: 


125 


Sb j ct : 


124 


Query: 


185 


Sbjct: 


184 


Query: 


245 


Sbjct: 


244 


Query: 


305 


Sb j ct : 


304 



GTGGWQLRALRWC+KAA KPIIADGGIRTH6DIAKSIRFGA+MVMIGSLFAGH ESPG+ 



+E +G+ +KEY+GSASE+ KGE KNVEGKK+ + KG ++DTL EM+QDLQSSISYAGG 



+i:i+++R+VDYVIVKNSI+NGD 



55 A related DNA sequence was identified in S.pyogenes <SEQ ID 6617> which encodes the amino acid 
sequence <SEQ ID 661 8>. Analysis of this protein sequence reveals the following: 



Possible site: 35 

>» Seems to have no N-terminal signal sequence 



60 



Pinal Results 
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bacterial cytoplasm 
bacterial mettibrane 
bacterial outside 



Certainty=0 . 2405 (Affirmative) < succ> 
Certainty=0 . 0000 (Not Clear) < suco - 
Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 297/327 (90%) , Positives = 311/327 (94%) 

Query: 1 MENDIPVFDYEDIQLIPNKCIISSRSQADTSVKLGNYTFKLPVIPANMQTIIDEEVAETL 50 

MENDIPVFDYEDIQLIPNKCII+SRSQftDTSV LG Y FKLPVIPANMQTXIDE +AE h 
Sbjct: 8 MF^^DIPVFDYEDIQLIPI!TOCIITSRSQaDTSVTI^KYQFKLPVIPM!^MQTIIDETIAEQL 67 

Query: 61 ACEGYPyiMHRFNEEERKPFIKRMHDKGLIASISVGVKDYEYDFVTSLKEDAPEFITIDI 120 

A EGYFYIMHRF+E+ RKPFIKRMH++GLIASISVGVK EY+FVTSLKEDAPEFITIDI 
Sbjct: 68 AKEGYFYIMHRFDEDSRKPFIKRMHEQGLIASISVGVKACEYEFVTSLKEDAPEFITIDI 127 

Query: 121 AHGHSNSVIEMIQHIKQELPETFVIAGNVGTPEAVRELENAGADATKVGIGPGKVCITKV 180 

AH6H+NSVI+MI+HIK ELPETWrAGNVGTPEAVRELENAGaDATKVGIGPGKVCITKV 
Sbjct: 128 AHGHANSVIDMIKHIKTELPETFVIASlWGTPEAVRELENAGBnATKVGIGPGKVCITKV 187 

Query: 181 KTGFGTGGWQIAALRWCSKAARKPIIADGGIRTHGDIAKSIRFGASMVMIGSLFAGHLES 240 

KTGFGTGGWQLAALRWC+KAARKPIIADGGIRTHGDIAKSIRFGASMVMIGSLFAGH ES 
Sbjct: 188 KTGFGTGGWQLAAIiRWCaKRARKPIIADGGIRTHGDIAKSIRFGASMVMIGSLFAGHFES 247 

Query: 241 PGKLVEVEGQQFKEYYGSASEYQKBEHKNVEGKKILLPVKGRLEDTLTEMQQDLQSSISY 300 

PGK VEV+G+ FKEYYGSASEYQKGEHKNVEGKKILLP KG h DTLTEMQQDLQSSISY 
Sbjct: 248 PGKOTEVDGETFKEYYGSASEYQKGEHKNVEGKKILLPTKGHLSDTLTEMQQDLQSSISY 307 

Query: 301 AGGKELDSLRHVDYVIVKNSIWNGDSI 327 

AGGK+LDSLRHVDYVIVKNSIWNGDSI 
Sbjct: 308 AGGKDLDSLRHVDYVIVKNSIWNGDSI 334 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2142 

A DNA sequence (GBSx2258) was identified in S.agalactiae <SEQ ID 6619> which encodes the amino 
acid sequence <SEQ ID 6620>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

>>> Seems to have an uncleavable N-term signal seq 
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Final Results 



bacterial membrane Certainty=0. 7793 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB61253 GB:AJ250422 ORFC [Oenococcus oeni] 
Identities = 157/447 (35%) , Positives = 252/447 (56%) , Gaps = 13/447 (2%) 
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Query: 11 AIITTAILGFSGILIETSMNVTFPIMJCEFGVNPAVIQWVTTONriLAVAVTVPLSAFMIK 70 

AI+ A L F G+LIETSMNVTFP LM++F ++ +QW+TT LL VA T+ ++AF+ K 
Sbjct: 15 AILGIAGLAFCGVLIETSMNOTFPTIiMQQFSISLNKVQWLTTAYLLLVAATISIAAFIEK 74 

5 Query: 71 NLSERQIFTIJiNVLFLSGVLIDSFAPNiAILLVGRVLQGVGTGLALPLLFHIIIiTQIPME 130 

++IF A +LF+ GV+ + APN ILL+GR++Q + TGLA+PLL 1+ QIP + 
Sbjct: 75 RFIFKKIFPWAGLLFIIGVICSAIAPNFLILLIGRLIQALSTGLAIPLLITEIMQQIPQK 134 

Query: 131 RRGLMMGVAAMVTLLAPAVGPTYGGVISGMLGWKMIFMLLAPILIISTFIGLASIPKRQV 190 
10 ++G M + + L P++6PTyGGVI+ L W++IF + PI +1+ IGL+ I ++ 

Sbjct: 135 KQGSY^ffiLVEraJLLWQPSLGPTYGGVITQDLSWRLIPWFVLPIGLIAWLIGLSFIEQKSS 194 

Query: 191 RINDKLNFPAFISLGIGLATLLLAIEKMSIF YLLVAIVSFVIFyYL--NKjQ 239 

+ FISL + L ++ +A+ 1+ +I1L+A++ ++F L N + 

15 Sbjct: 195 PSKIPFAWKQFISLILALLSITVaVNNAGIYGm'SIKFYGFLLmVILI.IVFIKLS 254 

Query: 240 LEFLNLNVFKDKDFSILLYGVLAFQMIPLALSFLLPNLLQLVLHQTSTKAGLFMFPGAIA 299 

+++++FK +F L Q I L+L+FLLPN QL+L + +G+ + G++ 

Sbjct: 255 QALISISIPKKWEFVCPLLIYFLIQFIQLSLTFLLPNYAQLILKKGVMISGIMLLCGSLI 314 

20 

Query: 300 WFLSPEflGYLLDKIGaFKPIMIGISLSLIGLlGTAIFIPAKSWVLLaFDILTKIGMGI 359 

L P G +LD P++1G + I IF SV ++ A ++ IG 

Sbjct: 315 SAILQPLTGRMLDSFSVKIPLVIGAFFLITSTISFTIFQRYLSVFLIAALYVIYMIGFSF 374 

25 Query: 360 GASNMVTTALTKLKPAQSADGNSILNTLQQFAGAFATAVASQIFTIGQVAIPKNGAIIGS 419 

+N +T AL KL +DGN++ NTLQQ+AG+ T+VAS + G K GS 

Sbjct: 375 VFtTOSLTYALQKLPIiKLISIXSmVFITrLQQYAGSLGTSVASALIANGIGTDGKQSN^ 434 

Query: 420 Q--FAVLFVIWVILAIVGLTYIjRKRK 444 
30 + F + F+ +++ ++ +K K 

Sbjct: 435 RHIFIIJSIFISCAIWILIFSIQRKKNK 461 

There is also homology to SEQ ID 46. 

Based on this analysis, it was predicted that this protein and its epitopes, coidd be useful antigens for 
35 vaccioes or diagnostics. 

Example 2143 

A DNA sequence (GBSx2259) was identified in S.agalactiae <SEQ ID 662 1> which encodes the amino 
acid sequence <SEQ ID 6622>. Analysis of this protein sequence reveals the following: 

Possible site: 52 
40 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2151 (Affirmative) < suco 

bacterial membrane Certaintyi=0.0000(Not Clear) < suco 

45 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6595> which encodes the amino acid 

sequence <SEQ ID 6596>. Analysis of this protein sequence reveals the following: 

50 Possible site: 32 

>» Seems to have an uncleavable N-tertn signal seq 

Final Results 

bacterial membrane Certaintyi=0 . 0000 (Not Clear) < suco 

55 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities- = 74/214 (34%) , Positives = 112/214 (51%) , Gaps = 5/214 (2%) 

Query: 13 NESE™FFITLKTYFNYLFSIQIIT---DISTIJ!JHaDFDGSFAFHDIETSIPHLVIDSNY 69 

N+ E F L +F++LF + I+T +1 ,+ + F G F+FH+ + +P L ++ 
Sbjct: 15 NQLEETFIRELSHHFSHLFEVTILTSKftNIQSNQLSTFQGIFSFHEHDIDLPTLYFKTSQ 74 

Query: 70 miSQraSKIEaNDIKTFSELSKTMTEFHYMiailFDLFNHLPYRFRLHimiGQTIYS^ 129 

++ + LS+ +T F+ + +LP + RL + +G I NH 

Sbjct: 75 HGQGFLVTESVEDQATAVLSLSQYLTGFYQKFIXSHFLQYLPLCJftRLSDftNGNIIVDN^ 134 

Query: 130 EDPFDIYPEEEYPIDKWVQNSLIEKKAKELHLLLPSASQDYILVQSYKRLENDSGQLVGY 189 

F P + 1+ W+ L LLPS S D+I +Q Y+ L+N GQLVG 

Sbjct: 135 NGSF--LPTTDKEIEDWILAELRLSDNPCKTFLLPSGSLDHIYMQHYQftLKNPQGQLVGV 192 

Query: 190 lEHVHNIKPLLEGYLKESGQAIVGWSDVTSGASI 223 

++ V +IKPLL YL+E+GQAIVGWSDVTSG SI 
Sbjct: 193 LDTVQDIKPLLNQYLEETGQAIVGWSDVTSGPSX 226 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2144 

A DNA sequence (GBSx2260) was identified in S.agalactiae <SEQ ID 6623> which encodes the amino 
acid sequence <SEQ ID 6624>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

>» Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0. 5840 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the foUowuig sequences in the GENPEPT database. 

>GP:AAF84709 GB:AE004010 potassium uptake protein [Xylella 
fastidiosa] 

Identities = 201/570 (35%) , Positives = 319/570 (55%) , Gaps = 34/570 (5%) 

Query: 1 MAEMQHVNHSSFDKASKAGFII- -ALGIVYGDIGTSPLYTMQSLVENQGGISSVTESFIL 58 

M+ H + ++ G IX A+G+V+GDIGTSPLYT-I-+ G++ ++ +L 

Sbjct: 1 MSTSSHSGDCTAVPSNSNGTIILSAIGWFGDIGTSPLYTLKEAFSPNYGLTPNHDT-VL 59 

Query: 59 GSISLIIWTLTLITTIKYVLVALKADNHHEGGIFSLYTLVRKMTPW LIVPAVI 111 

G +SLI W + L+ TIKYV V ++ DN EGGI +L L ++ P+ + + + 

Sbjct: 60 GILSLIFWAmLVVTIKYVAVIMRVDNDGEGGIMALTALTQRTMPFGSRSIYIVGILGIF 119 

Query: 112 GGATLLSDGALTPAVTVTSAVEGLKWPSLQHIFQNQSNVIFATLFILLLLFAIQRFGTG 171 

G + DG +TPA++V SAVEGL+V F V+ TL +L+LLF QRFGT 

Sbjct: 120 GTSLFFGDGVITPAISVLSAVEGLEVAEPHMKAF WPITLAVLILLFLCQRFGTE 174 



Query: 172 VIGKLFGPIMFIWFAFLGISGLLMSFAHPEVFKAINPYYGLKLLFSPENHKGIFILGSIF 231 

+GK FGPI +WF +G+ G+ N pgv AINP +GL F +F+LG++ 
Sbjct: 175 RVGKTFGPITLLWFIAI6VVGVYN1AQAPEVLHAINPSWGLH-FFLEHGWHSMFVLGAVV 233 



wo 02/34771 



PCT/GBOl/04789 



-2414- 

Query: 232 lATTGAEALYSDLGHVGRGNIHVSWPFVKVAII-LSYCGQGAWILANKEaGNEIiNPFFM 290 

LA TG EALY+D+GH G I +W +V + ++ L+Y GQGA +L+N A NPF+ S 
Sbjct: 234 lAWGGEALYADMGHFGAKAIRHAVMYVVLPMLAIitraXSQGALVLSNPTAIG- -NPFYQS 29i 

Query: 291 IPSQFTMHWILATLAAIIASQALISGSFTLVSEaMIUjKIFPQFRSTYPGDK-IGQTYIP 349 

IP ++ LAT AA+IASQALI+GS+H-L S+AM+L P+ + + IGQ Y+P 

Sbjct: 292 IPDWGLYPMIALATAAAVIASQALITGSYSLSSQAMQLGYIPRMNVRHTSQSTIGQIYVP 351 

Query: 350 VINWFLFAITTSIVLLFKTSAHMEAAYGLAITITMLMTTILLSFFL-IQKGVKRGLVLLM 408 

+NW L + V+ F S M +AYG+A+T TM++TT+L+ + V R ++ +M 

Sbjct: 352 TVNWTLLTLVILWIGF6DSTS^aSAYG\aVTGTMMITTVLMIIYAIUaIPRVPRLMLW^ 411 

Query: 409 MIFFGILEGIFFLASAVK™HGGyVVVIIAVAIIPIMTIWYKGSKIVSRYVKL--LDLKD 466 

I F ++G FF A+ +KFM G + +++ VI M W +G K++ ++ ++L + 
Sbjct: 412 AIVFmVDGAFFYANIIKFMDGAWFPLLLGWIFTFrWTWLRGRKLLHEEITOKDGINLDN 471 

Query: 467 YIGQLDKLRHDHRYPIYHTNVVYLTNRMEEDMIDKSIMYSILDKRPKKAQVYWFVNIKVT 526 

++ L L + P V+LT + ++ ++M+++ + + F+ +K 

Sbjct: 472 FI4PGL-^(ILAPPVKVP---GTAWLT--ADSTWPHAIM^]S^JKHNKVLHEa^W 524 

Query: 527 DEPYTA EYKVDMMGTDFIVKVELYLGF 553 

PY A K++ + F +V + GF 
Sbjct: 525 KIPYAANSEREiKIEPISNGF-YRVHIRFGF 553 

A related DNA sequence was identified ia S.pyogenes <SEQ ID 6625> which encodes the amino 
sequence <SEQ ID 6626>. Analysis of this protein sequence reveals the following: 

Possible site: 15 
»> Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0 . 5713 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF84709 GB:AE004010 potassium uptake protein [Xylella 
f astidiosa] 

Identities = 177/467 (37%) , Positives = 270/467 (56%) , Gaps = 20/467 (4%) 

Query: 7 TAFDKASKAGFII-ALGIVYGDIGTSPLYTIQSLVENQGGVNQVSESFILGSISLIIWTL 65 

TA S 1+ A+G+V+GDIGTSPLYT++ 6+ ++ +LG +SLI W + 

Sbjct: 11 TAVPSNSNGTIILSAIGWFGDIGTSPLYTLKEAFSPNYGLTENHDT-VLGILSLIFWAM 69 

Query: 66 TLITTIKYVLIALKADNHHEGGIFSLFTLVRKMSPW LIIPAMIGGATLLSDGA 118 

L+ TIKYV + ++ DN EGGI +L L ++ P+ + I + G + DG 

Sbjct: 70 MLWTIKYVAVIMRVDNDGEGGIMALTALTQRTMPFGSRSIYIVGILGIFGTSLFFGDGV 129 

Query: 119 LTPAVTVTSAIEGLKAVPGLSHIYQNQTNVIITTLVILIVLFGIQRFGTGFIGKIFGPVM 178 

+TPA++V SA+E6L+ + V+ TL +LI+LF QRFGT +GK FGP+ 
Sbjct: 130 ITPAISVLSAVEGLEVAEPHMKAF WPITLAVLILLFLCQRFGTERVGRTFGPIT 184 

Query: 179 FIWFSFLGVSGFENTLGHLEIFKAINPYYALHLLFSPENHRGIFILGSIFLATTGAEALY 238 

+WF +GV G +N E+ AINP + LH F +F+LG++ LA TG EALY 

Sbjct: 185 LLWPIAIGWGVraiAQAPEVLHAINPSWGLH-FFLEHGWHSMFVLGAWLAVTGGEALY 243 



Query: 239 SDKSHVGRGNIYVSWPFVKM-CIVLSYOSQAAWILANKHSGIELNPFFASVPSQLRVYLV 297 
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+D+GH.G I +W +V + + L+Y GQ A +L+N + NPF+ S+P ++ 
Sbjct: 244 MJMGHFGAKaiRHAlMrVVLPMIAIJSnnjGQGftLTO^ 301 

Query: 298 SLATLAAIIASQALISGSFTLVSEAMRLKIFPLFRVTYPG-iiNLGQLyiPVIIWILFAVT 356 ' 
5 +LA.T AA+IASdMiI+GS+H-L S+AM+L P V + + +GQ+y+P +NW L + 

Sbjct: 302 aiATAftAVIASQMiITGSYSLSSQ2\MQIiGYIPRMI!mmTSQSTIGQIYVPT\n5^ 361 

Query: 357 SCTVIAFRTSAHMEAAYGLAITITMLMTTILIiKYYLIKKGTRPIIiAHLVMAF-FALVEFI 415 
TV+ F S M +AyG+A+T TM++TT+L+ Y PL +MA F V+ 

10 Sbjct: 362 ILTVIGFGDSTSI^AYGVAVTGTMMITTVLMIIYARflNPRVPRLMLtWIMAlVFIAV^ 421 

Query: 416 FFIlASAIKF^fflGGYAWILAIAIVPVMFIWHaGTRIWKYVKSIi^M 462 

FF A+ IKFM G + ++L +1 M W 6 +++ + ++ +N 
Sbjct: 422 FFYaNIIKPMDGAWFPLLLGWIFTFMRTWIiRGRKLLHEEMRKDGIN 468 

15 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 485/651 (74%) , Positives = 575/651 (87%) 

Query: 10 SSFDKASKAGFIIALGIVYGDIGTSPLYTMQSLVENQGGISSVTESFILGSISLIIWTLT 69 

20 ++FDKASKAGFIIALGIVYGDIGTSPLYT+QSLVENQGG++ V+ESFILGSISLIIWTLT 

Sbjct: 7 TAFDKASKAGFIIALGIVYGDIGTSPLYTIQSLVENQGGVNQVSESFILGSISLIIWTLT S6 

Query: 70 LITTIKYVLVALKADNHHEGGIFSLYTLTOKMTPWLIVPAVIGGaTLLSDGALTPAVTVT 129 
LITTIKYVL+ALKAnNHHEGGIFSL+TLWKM+PWLI+PA+IGGATLLSDGALTPAVTVT 
25 Sbjct: 67 LITTIKYVIilALKRDNHHEGGIFSLFTLVRKMSPWLIIPAMIGQATLLSDGALTPAVTVT 126 

Query: 130 SAVEGLKWPSLQHIFQNQSNVIFATLFILLLLFAIQRFGTGVIGKLFGPIMFIWFAFLG 189 

SA+EGLK VP L HI+QNQ+NVI TL IL++LF IQRFGTG IGK+FGP+MFIWF+FLG 
Sbjct: 127 SAIEGLKAVPGLSHIYQNQTNVIITTLVILIVLFGIQRFGTGFIGKIFGFVMFIWFSFLG 186 

30 

Query: 190 ISGLUISFAHPEVFKAINPYYGLKLLFSPEaSlHKGIFILGSIFLATTQaEALYSDLGHVGR 249 

+SG N+ H E+FKAINPYY L LLFSPENH+GIFILGSIFLATTGAEALYSDL6HVGR 
Sbjct: 187 VSGFFOTI^HLEIFKAINPYYALHIJIiFSPENHRGIFIICSIFLATTGftEALYSDLGHVGR 246 

35 Query: 250 GNIHVSWPFVKVAIILSYCGQGAWILaNKmGNELNPFFASIPSQFTMHWIIATLAAlI 309 

GNI+VSWPFVK+ I+LSYCGQ AWILANK++G ELNPFFAS+PSQ +++V LATLAAII 
Sbjct: 247 GNIYVSWPFVKMCIVLSYCGQAAWILANKHSGIELNPFFASVPSQLRVYLVSLATLAAII 306 

Query: 310 ASQALISGSFTLVSEAMRLKIFPQFRSTYPGDNIGQTYIPVINWFLFAITTSIVLLFKTS 369 
40 ASQaLISGSFTLVSEftMRLKIFP FR TYPG N+GQ YIPVINW LFA+T+ VL F+TS 

Sbjct: 307 ASQALISGSFTLVSEAMRLKIFPLFRVTYPGMSILGQLYIPVINWILFAVTSCTVLAFRTS 366 

Query: 370 AHMEAAYGLAITITMLMTTILLSFFLIQKGVKRGLVLLMMIFFGILEGIFFLASAVKFMH 429 
AHMEAAYGLAITITMLMTTILL ++LI+KG + L L+M FF ++E IFFLASA+KFMH 
45 Sbjct: 367 AHMEAAYGIAITITMLMTTILLKYYLIKKBTRPILftHLVNiaFFALVEFIFFI^^ 426 

Query: 430 GGYVWIIAVAIIFIMTIWYKGSKIVSRYVKLIJDLKDyiGQLDKIiRHDHRYPIYHTNVVY 489 

GGY WI+A+AI+F+M IW+ G++IV +YVK L+L DY Q+ +IiR D + +Y TNWY 
Sbjct: 427 GGYAWILALAIVFVMFIWHAGTRIVFKYVKSLKnaroYKEQlKQLRDDVCFDLYQTN^ 486 

50 

Query: 490 LTNR^ffiEDMIDKSIMYSILDKRPra<AQVYWFWIKVTDEPYTAEYKVDMMGTDFIVKVEL 549 

L+NRM++ MID+SI+YSILDKRPK+AQVYWFVN++VTDEPyTA+YKVDMMGTD++V+V L 
Sbjct: 487 LSNRMQDHMIDRSILYSILDKRPKRAQVYWFVOTQVTDEPYTAKYKTOMMGTDYMVRVNL 546 

55 Query: 550 YLGFKMRQTVSRYLRTIVEELLESGRLPKQGKTYSVRPDSNVGDFRFIVLDERFSSSQNL 609 

YLGF+M QTV RYLRTIV++L+ESGRLPKQ + Y++ P +VGDFRF++++ER S+++ L 
Sbjct: 547 YLGFRMPQTVPRYLRTIVQDIiMESGRLPKQEQEYTITPGRDVGDFRFVLIEERVSNaRQL 606 

Query: 610 KPGERFVMLMKSSIKHWTATPIRWFGLQFSEVTTEWPLIFTANRGLPIKE 660 
60 ERF+M K+SIKH TA+P+RWFGLQ+SEVT EWPLI + LPIKE 

Sbjct: 607 SIWERFIMQTKASIKHVTASPMRWFGLQYSEVTLEVVPLILSDVLKLPIKE 657 

A related GBS gene <SEQ ID 8983> and protein <SEQ ID 8984> were also identified. Analysis of this 
protein sequence reveals the following: 

65 Lipop: Possible site: -1 Crend: 8 
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McG: Di scrim Score: 



5.84 



6vH-: Signal Score (-7.5): -4.59 

Possible site: 18 
>>> Seems to have an uncleavable N-terin signal seq 
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modified ALOM score: 2.92 



*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 5840 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certaintyi=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF02578(367 - 1680 of 2607) 

6P|9106998|gb|AAF84709.l|AE004010_6|AE004010(25 - 463 of 634) potassium uptake protein 
{Xylella fastidiosa} 
%Match =17.8 

%ldentity =40.4 %Similarity =63.7 

Matches = 177 Mismatches = 150 Conservative Sub.s = 102 

180 210 240 270 300 330 360 390 

TSTCLS*LK**RPGNALIISGLFIDKCCFEm^ICyNEPSHFFD*YYLIG6LaEMQHVliIHSSFDKASKftGFIIALGIVYGD 




10 20 30 



420 450 480 510 540 570 600 612 
IGTSPLYTMQSLVEN(2GGISSVTESFILGSISLIIWTLTLITTIKYVLVALKaDNHHEGGIFSLYTLVRKMTP W 




50 60 70 80 90 100 110 



639 669 699 729 759 789 819 849 

LI-VPAVIGGATLLSDGALTPAVTVTSAVEGLKWPSLQHIFQNQSNVIPATLFILLLLFAIQRFGTGVIGKLFGPIMFI 





130 140 150 160 170 180 



879 909 939 969 999 1029 1059 1089 

WFAPLGISGLLNSPAHPEWKAINPYYGLKLLFSPENHKGIFILGSIFLATTGAEALYSDLGHVGRGNIHVSWPPV^ 



WFIAIGWGVYNIAQAPEVLHAINPSWGLHFPLEHGWHS-MFVLGAVVLAWGGEALYADMGHFGAKAIRHAraWVVLPM 
200 210 220 230 240 250 260 



1116 1146 1176 1206 1236 1266 1296 1326 

I-LSYCGQGAWIIiflNKNAGNEIJTPFFASIPSQFTMHVVILATLAAIIASQALISGSFTLVSEAMRLKIFPQFRSTYPGnN 




280 290 300 310 320 330 340 



1353 1383 1413 1443 1473 1500 1530 1560 

-IGQTYIPVINWFLFAITTSIVLLFKTSAHMEAAYGLAITITMLMTTILLSFFL-IQKGVKRGLVLLMMIFFGILEGIFF 



TIGQIYVPTVNWTLLTLVILTVIGFGDSTSMASAYG\».VTGTMMITTVIMIIYARaNPRVPRIJ1Llfl^^ 
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360 370 380 390 400 410 420 / . 

1590 1620 1650 1680 1710 1740 1770 1800 

LASAVKFMHGGYVWI lAVAI I PIMTIWYKGSKIVSRYVKLIJDLKDYIGQLDKLRHDHRYPiyHTNVVYLTNRMEEDMID 
5 |: :|1| I : ::: I I I I :| |:: :: 

YiaillKFMDGAWFPLLLGWIFTFMRTWLRGRKLLHEErroKDGINLDNFLPGLMIJ^PVKV^ 

440 450 ,460 470 480 490 500 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
10 vaccines or diagnostics. 

Example 2145 

A DNA sequence (GBSx2261) was identified in S.agalactiae <SEQ ID 6627> which encodes the amino 

acid sequence <SEQ ID 6628>. This protein is predicted to be serine dehydrogenase. Analysis of this 

protein sequence reveals the following: 

15 Possible site: 26 

»> Seems' to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3251 (Affirmative) < suco 

20 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AaD07424 GB:AE000552 short chain alcohol dehydrogenase 

25 [Helicobacter pylori 26695] 

Identities = 18/31 (58%) , Positives = 25/31 (80%) 

Query: 3 WVASQPEHININRIEIMPVSC2TYGPQPVYRD 33 
W+ QP H+NINRIEIMP+SQT+ P P +++ 
30 Sbjct: 219 WIYEQPLHVNINRIEIMPISQTFAPLPTHKN 249 

I 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6629> which encodes the amino acid 
sequence <SEQ ID 6630>. Analysis of this protein sequence reveals the following: 

Possible site: 21 
35 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 1021 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 24/33 (72%) , Positives = 29/33 (87%) 

45 Query: 1 MSWVASQPEHININRIEIMPVSQTYGPQPVYRD 33 

+SWV QP H+N+NRIE+MPVSQ+YGPQPV RD 
Sbjct: 20 VSWVIHQPPHVNVNRIELMPVSQSYGPQPVTRD 52 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
50 vaccines or diagnostics. 

Example 2146 

A DNA sequence (GBSx2262) was identified in S.agalactiae <SEQ ID 663 1> which encodes the amino 
acid sequence <SEQ ID 6632>. Analysis of this protein sequence reveals the following: 
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Possible site: 21 

>» May be a lipoprotein 



Pinal Results . ' 

5 bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0.0000(Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9337> which encodes amino acid sequence <SEQ ID 9338> 
10 was also identified. A further related GBS nucleic acid sequence <SEQ ID 10781> which encodes amino 
acid sequence <SEQ ID 10782> was also identified. A further related GBS nucleic acid sequence <SEQ ID 
10951> which encodes amino acid sequence <SEQ ID 10952> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CaA32349 GB:X14130 ORF (AA 1 to 299) [Lactococcus lactis siabsp, 
15 cremoris] 

Identities = 72/215 (33%) , Positives = 110/215 (50%) , Gaps = 8/215 (3%) 

Query: 4 RSKLAAGFLTLMSVATIAACSGKTSNGTN- -VVTMKGDTITVSDFYDQVKTSKAAQQSML 61 
, + K+ L + L SG SN T+ V T G +T S FY ++K S + + 
20 Sbjct: 2 KKKMRLKVLLASTATALLLLSGCQSNQTDCJTVATYSGGKOTESSFYKELKQSPTTCT^ 61 

Query: 62 TLILSRVFDTQYGDKVSDKKVSEAYNKTAKGYGNSFSSALSQAGLTPEGYRQQIRTTMLV 121 

+++ R + Y6 VS K V++AY+ + YG +F + LSQ G + +K+ +RT L 
Sbjct: 62 NMLIYRAliNHAYGKSVSTKTVNDAYDSYKQQYGENFDAFLSQNGFSRSSFKESLRTNFLS 121 

25 

Query: 122 EYAVKEAAKKELTEANYKEAYKNYTPETSVQVIKLDAEDKAKSVLKDVKADGADFAKIAK 181 

E A+K+ K+++E+ K A+K Y P+ +VQ I ED AK V+ D+ A G DFA +AK 
Sbjct: 122 EVALKKL--KKVSESQLKAAWiaYQPKVTVQHILTSDEDTAKQVISDLAA-GKDFAMIiAK 178 

30 Query: 182 E---ICrTATDKKVEYKFDSA6TTLPKEVMSAAFKL 213 

T D + F+ TL AA+KL 
Sbjct: 179 TDSIDTATKIJNGGKISFEIJqNRrriDATFKDAAYKL 213 

A related DNA sequence was identified in S.pyogenes <SEQ ID 663 3> which encodes the amino acid 
35 sequence <SEQ ID 6634>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

»> May be a lipoprotein 

40 Final Results 

bacterial meiribrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the databases: 

>GP:AAA25247 GB:M83946 maturation protein [Lactobacillus paracasei] 
Identities = 88/294 (29%) , Positives = 146/294 (48%) , Gaps = 14/294 (4%) 

Query: 7 LIASVVTLASVMALAACQSTNDNTKVISMKGOTISVSDFYNETKNTEVSQKAMIJ^ 66 
50 L+AS T +++ L+ CQS + KV + G ++ S+FY E K + ++ + N++I R 

Sbjct: 10 LLASTAT--ALLLLSGCQSNQADQKVATYSGGKVTESNFYKELRQSPTTKTMLANMLIYR 67 

Query: 67 VFEAQYGDKVSKKEVEKAYHKTAEQYGASFSAALAQSSLTPETFKRQIRSSKLVEYAVKE 126 
YG VS K V AY +QYG +F A L+Q+ + +FK +R++ L E A+K+ 
55 Sbjct: 68 AIJfflAYGKSVSTKTVNDAYDSYKQQYGENFDAFLSQNGFSRSSFKESLRTNFLSEVALKK 127 

Query: 127 AAKKELTTQEYKKAYESYTPTMAVEMITLDNEETAKSVLEELKAEGADFTAIAKE KT 183 

K+++ + K +++Y P + V+ I +E+TAK V+ +L A G DF +AK T 
Sbjct: 128 L--K3WSESQLKAVWKTYQPKVTVQHILTSDEDTAKQVISDL-AAGKDFATLAKTDSIDT 184 



60 



Query: 184 TTPEKKOTYKFDSGATNVPTDVVKAASSLNEGGISDVISVLDPTSYQKKFYIVKVTKKAE 243 
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T + F+S + AA L G + 

■ Sbjct: 185 ATKDNGGKISPESNNKTLDATFKDAAYKLKNGDYTQT 



P + ++K+ 

PVKVTNGYEVIKMINH-P 238 



Query: 244 KKSDWQEYmiLKAIIIAEKSKDMNFQinCVIANALDKAlWKIKDKM 297 

K + KK Ii A + A+ S+D + +VI+ L +V IKDK A+ L Y 
Sbjct: 239 AKGTFTSSKKALTMWAKWSRDSSIMQRVISQVLKNQHVTIKDKDIADALDSY 292 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 125/213 (58%) , Positives = 168/213 (78%) , Gaps = 1/213 (0%) 

Query: 1 MKTRSKLAAGFLTLMSVATI^CSGKTSNGTNVVTMKBOTITVSDFyDQVK^ 60 

MK +KL A +TL SV LAAC T++ T V++MKGDTI+VSDFY++ K ++ +Q++M 
Sbjct: 1 MKNSNKLIASVVTLASVMALAACQS-TNDOTKVISM 59 

Query: 61 LTLILSRVFDTQYGDKVSDKKVSEAYNKTAKGYGNSFSSALSQAGLTPEGYKQQIRTTML 120 

L L++SRVF+ QYGDKVS K+V +AY+KTA+ YG SFS+AIj+Q+ LTPE +K+QIR++ L 
Sbjct: 60 IJ)&VISRVFEAQYGDKVSKKEVEKAYHKTAEQYGASFSAAIiAQSSLTPETFKRQIRSSKL 119 

Query: 121 VEYAVKESUiKKELTEAimEAYKim'PETSVQVIKLnAEDKAKSVLKD^^ 180 
VEYAVKEAAKKELT YK+AY++YTP +V++I LD E+ AKSVL+++KA+GADP lA 
^ Sbjct: 120 VEYAVKEAAKKEBTTQEYKKAYESYTPTMAVEMITLDNEETAKSVLEELKAEGADPTAIA 179 

Query: 181 KEKTTATDKKVEYKFDSAGTTLPKEVMSAAFKL 213 

KEKTT +iaCV YKFDS T +P +V+ AA 1. 
Sbjct: 180 KEKTTTPEKKVTYKFDSGATNVPTDWKAASSL 212 

SEQ ID 10782 (GBS657) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 143 (lane 8-10; MW 62.8kDa) and in Figure 187 Gane 3; MW 63kDa). 
Purified GBS657-GST is shown in Figure 245, lanes 2 & 3. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2147 

A DNA sequence (GBSx2263) was identified in S.agalactiae <SEQ ID 6635> which encodes the amino 
acid sequence <SEQ ID 6636>. This protein is predicted to be methyltransferase. Analysis of this protein 
sequence reveals the following: 

Possible site: 44 

»> Seems to have no N-terminal signal sequence 



>GP:CAA68045 GB:X99710 methyltransferase [Lactococcus laotis] 
Identities = 132/227 (58%) , Positives = 169/227 (74%) 

Query: 1 IWQSySKNAIraNMRRPVVKEEIVQY^ffiQHQKQN^lGCLaELEAFAKQENIPIIPHETATYF 60 

MV++Y +N M RPWK E+V++MR Q Q G LAE+ FAK+ NIP+IPHET YF 
Sbjct: 1 MVETYKSTSNPMMNRPVVKAELVEWMRSSQTQVTGEIAEVLNFAKENNIPW 60 

Query: 61 RFLMQTLQPKHILEIGTAIGFSALIMAENAPEAOTTIDRlffiEMIAIiAKENFAK^^ 120 

+ 1.+ L+PK ILEIGTAIGFSAL+MA+ PEA+I TIDRN EMI LAK+N AKYD+ NQ 
Sbjct: 61 QMLLSLLKPKRILEIGTAIGFSALVMAQEVPEAEIVTIDRNPEMIEIJUaCNLAKYDHRNQ 120 



Final Results 



bacterial cytoplasm — Certainty=o . 2576 (Affirmative) < suoo 
bacterial membrane — Certainty^O . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 



Query: 121 ITLLEGDAVDVLQTLDKSYDFVFMDSAKSKYIVFLPQVLKHLDVGGVWLDDIFQGGDIA 180 

I L EGDA DVLQ L +D VFMDSAKSKY+ FLP+ L+ L G++++DD+FQ G+I 
Sbjct: 121 IQLKEGDAADVLQELKGPFDLVFMDSAKSKYVEFLPKSLEIiLSENGLILMDDVFQAGEIL 180 
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Query: 181 KPIDEVRRGQRTIYRGLQRLFDSTLQHPDLTATLVPLGDGLLMIRKN 227 

PI EV+R QR + RGL++LFD +P +++PLGDGLLMI+K+ 
Sbjct: 181 LPIMEVKRNQRALERGUIKLFDWFDNPKYOTSVLPLGDGLLMIKKH 227 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6637> which encodes the amino acid 
sequence <SEQ ID 663 8>. Analysis of this protein sequence reveals the following: 

Possible site: 46 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.38 Transmembrane 153 - 169 ( 152 - 170) 



Final Results 

bacterial membrane — 

bacterial outside — 
bacterial cytoplasm 



Certainty=0. 1553 (Affirmative) < suco 

Certainty=0 . 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:CRA68045 GB:X99710 methyltransferase [Lactococcus lactis] 
Identities = 134/227 (59%) , Positives = 169/227 (74%) 



Query: 1 MVKSYSKTANHNMRRPWKEELVHYMRTRQKQTTGFLAELEQFARQENIPIIQPEWAYF 60 

MV++Y T+N M RPWK ELV +MR+ Q Q TG LAE+ FA++ MIP+I E V YF 
Sbjct: 1 MVETYKSTSNPMMmPWKAELVEWIffiSSQTQVTGEL2ffiVLNFAKEN^ 60 

Query: 61 RPLU3SLQPKHILEIGTAIGFS2iLLMAENAPimTIVTIDRNREMIDFAKKNFAKYDSRQQ 120 

+ LL L+PK ILEIGTAIGFSM.+MA+ P+A IVTIDRN EMI+ AK N AKYD R Q 
Sbjct: 61 QMLLSLLKPKRILEIGTAIGFSALVMAQEVPEAEIVTIDRNPEMIELAKKNIAKYDHRNQ 120 

Query: 121 IRLLEGDAADILSTLEGNFDFVFMDSAKSKYIVFLPEILRLLKVGGWILDDVFQGGDIT 180 

I+L EGDAAD+L L+G FD VPMDSAKSKY+ FLP+ L LL G++++DDVFQ G+I 
Sbjct: 121 IQLKEGDRADVLQELRGPFDLVPMDSAKSKYVEFLPKSLELLSENGLILMDDVFQafiEIL 180 

Query: 181 KPIEDIRRGQRTIYRGLQSLFDATLTHENLTTSLVPLSDQLLMIRKN 227 

PI +++R QR + RGL+ LFD +P TS++PL DGLI1MI+K+ 
Sbjct: 181 LPIMEVKRNQRALERGLRKLFDEVFDNPKXMTSVLPLGDGLLMIKKH 227 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 177/235 (75%) , Positives = 199/235 (84%) 

Query: 1 MVQSYSKNANHNMRRPWKEEIVQYMRQHQKQNNGCLAELEAFAKQENIPIIPHETATYF 60 

MV+SYSK ANHNMRRPWKEE+V YMR QKQ G LAELE FA+QENIPII E YF 
Sbjct: 1 MVKSYSKTANHNmRPTOCEELVHYMRTRQRQTTGFLliELEQPARQEN^ 60 

Query: 61 RFLMQTLQPKHILEIGTAIGFSALLMAENAPEaKITTIDRNEEMIALAKENFAKyni^ 120 

RFL+Q+LQPKHILEIGTAIGFSaLLMAENaP+A I TIDRN EMI AK WFAKYD+ Q 
Sbjct: 61 RFLLQSLQPKHILEIGTAIGFSALLMAENAPDATIVTIDRNREMIDFAKaNFAKYDSRQQ 120 



Query: 121 ITLLEGDAVDVLQTLDKSYDFVFMDSAKSKYIVFLPQVLKHLDVGGWVLDDIFQGGDIA 180 

I LLEGDA D+L TL+ ++DFVFMDSAKSKYIVFLP++L+ L VGGW+LDD+FQGGDI 
Sbjct: 121 IRLLEGDAADILSTLEGNFDFVFMDSAKSKYIVFLPEILRLLKVGGWILDDVFQGGDIT 180 



Query: 181 KPIDEVRRGQRTIYRGLQRLFDSTLQHPDLTATLVPLGDGLLMIRKNADHIVLED 235 

KPI+++RRGQRTIYRGI1Q LFD+TL HP+LT +LVPL DGLLMIRKN IVL D 
Sbjct: 181 KPIEDIRRGQRTIYRGLQSLFDATLTHENLTTSLVPLSDGLLMIRKNQADIVLPD 235 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example2148 

A DNA sequence (GBSx2264) was identified in S.a'galactiae <SEQ ID 6639> which encodes the amino 
acid sequence <SEQ ID 6640>. This protein is predicted to be phosphoglycolate phosphatase. Analysis of 
this protein sequence reveals the following: 

5 Possible site: 50 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2193 (Affirmative) < suco 

10 bacterial membrane — Certainty=0. 0000 (Not Clear) >: suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8985> which encodes amino acid sequence <SEQ ID 8986> 
was also identified. This protein appears to be a hydrolase i.e. an exposed protein. 

1 5 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA91552 GB:Z67740 unidentified [Streptococcus pneumoniae] 
Identities = 39/117 (33%) , Positives = 67/117 (56%) , Gaps = 9/117 (7%) 

(2uery: 98 KEQESRDSKIHLM-PYAKEILEWTKEQDIPNFMYTHKGASTHSVLETLQISHYFDEILTG 156 
20 KE E+R+ + ++ ++LE Q +F+ +H+ +LE 1+ YF E++T 

Sbjct: 25 KENEftRELEHPILFEGVSDLLEDIIMQGGRHFLVSHRNDQVLEILERTSIAAVFTEVVTS 84 

Query: 157 VSGFERKPHPQGINYLVKRYSLDKSMTYYIGDRPLDLEVAQNAGIKS INLR 207 

SGF+RKP+P+ + YL ++Y + + IGDRP+D+E Q AG+ + +NLR 
25 Sbjct: 85 SSGFKRKPNPESMLYLREKYQISSGLV--IGDRPIDIEAGQAAGLDTHLFTSIVNLR 139 

SEQ ID 8986 (GBS240) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 57 (lane 2; MW 26kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 61 (lane 3; MW 51.5kDa). 

30 GBS240-GST was purified as shown in Figure 225, lane 12. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefijl antigens for 
vaccines or diagnostics. 

Example 2149 

A DNA sequence (GBSx2265) was identified in S.agalactiae <SEQ ID 6641> which encodes the amino 
35 acid sequence <SEQ ID 6642>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have no W-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0 . 2620 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty^O. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6643> which encodes the amino acid 
45 sequence <SEQ ID 6644>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

»> Seems to have no W-terminal signal sequence 

Final Results 

50. bacterial cytoplasm — Certaintyte0.2967(Affi2mative) < suco 

bacterial membrane — Certaintyi=0. 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 463/599 (77%) , Positives = 541/599 (90%) 

5 

Query: 1 MSDNRSHIEEKYQWDLTTVFATDELWETEWELTQAIDNAKBFSGHLLDSSQSLLEITEV 60 

M+DNRSH+EEKY WDL+T+FATD+ WE EV +L ++ +KGF+GHLI1DSS +I1L++T+ 
Sbjct: 1 lOTDITOSHLEEKyTTOLSTIPATDKDWEAEVSDIATEVEaSKGFJ«3HLLDSSMnJLK^ 60 

10 Query:, 61 ELDLSRRLEKVnnWASMKNDQDTTVaKYQEFQAKATALYAKFSETFSFYEPELLQLSESD 120 

L+L+RR+EKVYVYA MK[qDQDTTVAKyQE+QAKA+ LYAKFSE FSFY+PE++ L + D 
Sbjct: 61 YLELARRVEK\rmAimKiroQDTTOAKYQEYQAKASGLYAKFSEVFSFYDPEVMMLHQBD 120 

Query: 121 YQSFLLEMPDLQKYDHFFEKIFANKPHVLSQNEEELLAGASEIPGAAGETFEILDNADMV 180 
15 YQ+FL E P+L+ Y+HFF+K+F + HVLSQ EEELLAGA EIF A ETF ILDNAD+V 

Sbjct: 121 YQAFLTETPELKVYMHFFDKLFQaREHVLSQAEEELLAGAQEIFNGAEETFSILDNADIV 180 

Query: 181 FPWKtlAKBEEVELTHGNFISLlffiSSDRTVRKEAYQAMYSTYEQFQHTyAKTranWKSQ 240 
FPWKN KGE+VELTHGNPISLMES DR+VR+ Ay+AMYSTYEQFQHTYAKTLQTNVK Q 
20 Sbjct: 181 FPVVKtroKGEDVELTHGNPISLMESKDRSTOQRAYEAMYSTYEQFQHTYAKTLQTl^^ 240 

Query: 241 NPKARVHHYQSARQSALSANFIPEEVYETLIKTVNHHLPLLHRYMKLRQKVLGLDDLKMY 300 

N+KARVH Y SaRQ+A++ANFIPE VY+TL++TVN HLPLLHRY+KLRQ+VLGLDDIjKMY 
Sbjct: 241 NYKARVHKYDSARQAAMRftNFIPEAVYDTLLETVNKHLPLLHRYLKLRQEVIfil^^ 300 

25 

Query: 301 DVYTPLSQMDMSFTYDEALKKSEEVLAIFGEAYSERVHRAFTERWIDVHVNKGKRSGAYS 360 

DVYTPLS+' D++ . YDEAL+K+E+VI1A+FG+ Y++RVHRAFTERWIDVHVNKBKRSGAYS 
Sbjct: 301 0\nP^PLSETDLAIGYDE7U:lEKAEK^^aVFGKDYADRVHRAFTERWIDVH^7NKB^^ 360 

30 Query: 361 GGSyDTNAFMLLNWQDTLDNLYTLVHETGHSLHSTFTRENQPYVYGDYSIFLAEIASTTN 420 

GGSYDTNAF+LLNWQDTLDNLYTLVHETGHSLHSTFTRE QPYVYGDYSIFLAEIASTTN 
Sbjct: 361 GGSYDTNAFlLUiWQDTLDNLYTIiVHETGHSLHSTFTRETQPYWGDYSIFLftEI^^ 420 

Query: 421 ENILTETLLKEVKDDKNRFAIIjNHYI£)6FRGTIFRQTQFAEFEHAIHVAIX3E^ 480 
35 ENI+TE LL EV+D+K RFAIIiNHYIiDGF+GT+FRQTQFAEFEHAIH ADQ+G+VLTSEY 

Sbjct: 421 ENIKraMIJiJEVQDEKERFAIIMraJDGFRGWFRQTQFAEFEHAIHQADQKGEVLT^^ 480 

Query: 481 lOjraLYAELNEKYYGLTKEDtraFIQYEWARIPHFYYNYYVFQYATGFAAAOT^ 540 
W LYA+LNEKYY6L+K+DNHFIQYEWARIPHFYYNYYV+QYATGFAAA+YLA++IV+G 
40 Sbjct: 481 IJIQLYADIJ^raKYYGLSKKDlffiFIQYEWARIPHFYYl^YYVYQYATGFAAASYLBDKIVHGT 540 

Query: 541 PEDKEAYIJmiKftGNSDYPLlWIAKaGVDMTSADYLDAAFROTEERLVELEKttiV^ 599 

+D + YL YLK+GNSDYPL VIAKAGVDM DYL+AAF+VF+ERL ELE LV+KG+H 
Sbjct: 541 QDDIDHYIAYLKSGNSDYPLEVIAKAGVDMEKGDYLEAAFKVFDERLTELEVLVSKGIH 599 

45 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2150 

A DNA sequence (GBSx2266) was identified in S.agalactiae <SEQ ID 6645> which encodes the amino 
50 acid sequence <SEQ ID 6646>. This protein is predicted to be competence protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence 

55 Final Results 

bacterial cytoplasm — Certainty=0. 2955 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



60 The protein has homology with the following sequences in the GENPEPT database. 
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>GP:AAC23746 GB:AF052209 competence protein [Streptococcus pneumoniae] 
Identities = 127/269 (47%) , Positives = 176/269 (65%) , Gaps = 8/269 (2%) 

Query: 1 MLIAKDKQCatt,imJJESHPGK£3QYFCPTCCS3WKLKAGRIMRRHFMISLra^ 60 
5 M +A+D +G L+N+LE K Y CP C + L+ G +R HFAH SLK+C F+ ENE 

Sbjct: 1 MFVJ«?DARGELVNVLEDKI,EKC3aYrCPAa3GQLHLRQGPSTOTHFJai^^ 60 

Query: 61 SNEHLQLKAKLYMSLSRENETMLEHHLPEINQIMDLFVNETLALE VQCSRLSEQRL 116 

S EHL K LY L +E + LE+ L E+ QIAD+FVN lALE V C + + L 
10 Sbjct: 61 SPEHIiKNKESLYHWLKKETKVQLEYPLSELKQIM>VFVNGNIALESSVWPCLK---KVL 117 

Query: 117 RERTKAYLQRDFQWWLLGBKLWLKHRLTNLHKQFLQFSQSIGFHiWELDLRLEVLRLKY 176 

+ER++ Y +QV WLLG+KLWLK RLT L FL FSQ++GF++WELD H-VLRLKY 
Sbjct: 118 KERSEGYRSQGYQVLWLMQKLTOiKERLm.QAGFLYFSQNMGFYVWELDRGRQVLRLKy 177 

15 

Query: 177 LIYEDLRGHVYYLSKTCPL-SGDVLAFLKWPYQSKNIOTYKVKQDRNIRDYVRQQLRYGN 235 

LIY+DLRG ++Y K G +Ii L+ PY+ + ++ + V +D++I Y+RQQL Y N 

Sbjct: 178 LIYQDLRGKLHYQIKEFSYGQGSIiLEILRLPYKRQKISHFTVSEDKDICRYIRQQLYYC2N 237 

20 Query: 236 QFWLRRQEKAYLSGQNLLTQELMMFFPQI 264 

FW+++Q +AY G+N+LT L ++PQI 
Sbjct: 238 LFWMKEQAEA.YQKGENILTYGLKEWYPQI 266 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6647> which encodes the amino acid 
25 sequence <SEQ ID 6648>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certaintyi=0. 1034 (Affirmative) < suco 

bacterial membrane Certainty= 0.0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

35 Identities = 154/312 (49%) , Positives = 204/312 (65%) , Gaps = 1/312 (0%) 

Query: 1 MLIAKDKQGNLINLL-ESHPGKGQYFCPTCCSAVRLKAGRIMRRHFAHISLKNCQFYHEN 59 

+L A D + LI+L+ + K + CP C S VRL+ G I R HFAH+ L +CQF EN 
Sbjct: 4 ILTALDDKNQLISLVTQPISTKPPFRCPACKSFVREiRQGTIRRPHFAHVQLAHCQFQAEN 63 

40 

Query: 60 ESNEHLQLKAKLYMSLSRENETMLEHHI^EINQIADLFVNETLALEVQCSRLSEQRLRER 119 

ES EHL LKAKLY SL R +E +LPE+ QIADL+VN+ LALE+QCS L +RL++R 

Sbjct: 64 ESEEHLTLKAKLYTSLTOTEAVCIEKYLPELQQIADLWVNDKLALEIQCSPLPVERLKKR 123 

45 Query: 120 TKRYLQADFQVRWLLGEKLWLKHRLTNLHKQFLQFSQSIGFHIWELDLRLEVLRLKYLIY 179 

TKRY + + VRWLLG KLWL LT L KQFL FS S+GFH+WELD +I1RLKYLI+ 
Sbjct: 124 TKMQEK<KPTOWLLGRKLtnOTHLTALQKQFLYFSSSLGFHLWELI»^^ 183 

Query: 180 EDLRGHVYYLSKTCPLSGDVLAFLKWPYQSKNIiNFYKVKQDRNIRDYWQQ^^^ 239 

50 EDL G V YL+KT L +++ + PYQ + L Y+ K N+ +++ L + WL 

Sbjct: 184 EDLFGKVSYLTKTISLDHNIMEMFRLPYQQEILYSYQKKMTVNLSKRIQRALLARHPKWL 243 

Query: 240 RkQEKAYBSGQNLLTQELMMFFPQXQPPRVDTDFCQITNSLTSFYQNFTNYYQKNKNNLD 299 
R+QEKAYLSG NLL F+PQ +P + + FCQI +L +Y++F YY+K K+ 

55 Sbjct: 244 RRQEKAYLSGYNLLMLTTDAFYPQWRPVQSSSGFCQIKtaSILRPYYESFKAraT^ 303 

Query: 300 QTLYPPVFYDKI 311 

Qrij+ P K+ 
Sbjct: 304 QTLFSPKYYVKM 315 

60 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 2151 

A DNA sequence (GBSx2267) was identified in S.agalactiae <SEQ ID 6649> which encodes the amino 
acid sequence <SEQ ID 665 0>. This protein is predicted to be bicyclomycin resistance protein. Analysis of 
this protein sequence reveals the following: 

Possible site: 25 

>>> Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial membrane — Certainty=0. 4333 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>6P:CaA15047 GB:AJ235272 BICYCLOMyCIN RESISTANCE PROTEIN (bcrl) 
[Rickettsia prowazekii] 
Identities = 86/336 (25%) , Positives = 159/336 (46%) , Gaps = 28/336 (8%) 



Query: 73 GKKNTVLLGLCLILMSGFISFFTSNPSLAMASRLLLGIGIGLYNSLSISIITDLYEADER 132 

G++ VLLGL + ++S IS F+ N + M +R + Q+ + + + S+ D Y+ E 
Sbjct; 70 GRRPIVLLGLPIYIVSSIISIFSENlEMLMIARFIOAFGVSVGSVIGQSMARDSyQGAEL 129 

Query: 133 ASMIGIJlTASLNIGKALTTFIVGLVIA-IGVNYIYLVyLLVIPVFF-PFWKNVPEVENQT 190 

+ + + + L AL ++I G ++ + +Y+++ + L + +++ +PE 
Sbjct: 130 SYVyAILSPWLLFIPALGSYIGGYIlEYLSWHYVFIFFSLAGTILLALYYQILPETNYYI 189 

Query: 191 HTLKASTTFDT KAALLMLITFLVGI---AYIGATVKIPTLLVTKYHYATSFSSNM 242 

++S F+ K +L L P++G Y G ++ P +L+ + SF + 

Sbjct: 190 AFSQSSKYFEVFNIIIKDKMLWLYAPIIGAFNGIYYGFFIEAPFILIDQMRVLPSFYGKL 249 

Query: 243 LTLLAFSGILVGSVFGKLVK---VFQEKTLLIMILAMGIGNVLFALANNQIIFIVAS--I 297 

LL+F+ I G + G L+K V+ +K + I + G +LFA+ + + FI+ S 
Sbjct: 250 AFLLSFASIFGGFLGGYLIKKRQVYDKKVMSIGFIFSLCGCILFAVDSFILEFILVSNVF 309 

Query: 298 LIGASFVGTM SSVFFYISKNYAKEHNNPITSLALTAGNI-GVILTPLI--LTKLP 349 

I F+ M S+ 1+ YA E +T TAG+I G I +1 +T 

Sbjct: 310 AIAMIFMPimiHMIGHSLLIAITLRYAIiEDYATVT6---TAGSIFGAIYYWIASVTYCV 366 

Query: 350 SQLHLEPFMTPFLITSGLMVINV--FVYLVLMSKNK 383 

S++H E L+ L + +V F y+ L+ K K 

Sbjct: 367 SKIHGETISNFSLLCLVLSISSVISFYYICLLYKKK 402 

A related GBS gene <SEQ ID 8987> and protein <SEQ ID 8988> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 

McG: Discrim Score: 6.28 

GvH: Signal Score (-7.5): -2.45 
Possible site: 25 

>» Seems to have a cleavable N-teim signal seq. 

ALOM program count: 10 value: -8.33 threshold: 0.0 

INTEGRAL Likelihood = -8.33 Transmembrane 78 - 94 ( 75 - 96) 
INTEGRAL Likelihood = -8.33 Transmembrane 269 - 285 ( 267 - 287) 
INTEGRAL Likelihood = -7.38 Transmembrane 290 - 306 ( 287 - 314) 
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*** Reasoning Step: 3 

Final Results 

bacterial membrane — 
bacterial outside — 
bacterial cytoplasm 



Certainty=0. 4333 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certaintyi=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

ORF01955(517 - 1449 of 1749) 

EGADI 163303 |RP603 (70 - 402 of 407) bicyclomycin resistance protein {Rickettsia prowazekii} 
OMNI|nT01RP0626 conserved hypothetical protein GP| 3861147 1 emb |CftA15047 . 1 1 |aJ235272 
BICYCLOMYCIN RESISTANCE PROTEIN (bcrl) {Rickettsia prowazekii} PIR|E71665 |E71665 
bicyclomycin resistance protein (bcrl) RP603 - Rickettsia prowazekii 
%Match =5.9 

%Identity =26.5 %Similarity =52.0 

Matches = 85 Mismatches = 141 Conservative Sub.s = 82 



474 504 534 564 594 624 654 684 

SLOTIPAMMITIFVILSNFVVTKLGKKNTVLLGLCLILMSGFISFFTSNFSLAMASRLLLGIGIGLYNSLSISIITDLYE 

!:•: mil : ::| |||: | : | :]:: |: : : : |: \ \ : 
MTSTLYFLGFAVGILSLGRLSDIYGRRPIVLLGLFIYIVSSIISIFSFNIEMLMIARFIQAFGVSVGSVIGQSMARDSYQ 
60 70 80 90 100 110 120 



714 744 774 ,801 831 858 888 

ADERASMIGLRTASLNIGKALTTFIVGLVIA-I6VNYIYLVYLLVIPVFF-FFWKNVPEVENQTHTLKAST---TFOT 

I : : : : I || ::|| :: : :|::: : | ::: :::: :|1 ::| |: 

GAELSYWAILSPWLLFIPALGSYIGGYIIEYLSWHYVFIFFSIJ«3TILIALYYQILPETNYYIAFSQSSKyFEVENIII 
140 150 160 170 180 190 200 

933 954 984 1014 1044 1074 1095 1125 

KAALLMLITFLVG- - - lAYIGATVKIPTLLVTKYHYATSFSSNMLTLLAPSGILVGSVFGKLVK- - - VFQEKTLLIMILA 
I :| I I::| 1 1 == I :|: = 11 = Ibh h I : I hi h =1 = I = 

kdkmlwlyafiigaengiyygffieapfilidqmrvlpsfygkiafllspasifggflggylikkrqvydkk™ 

220 230 240 250 260 270 280 



1155 1182 1209 1224 1254 1284 1311 1335 

MGIGNVLFALANNQIIFI-VASIL-IGASFVGTM SSVFFYISKNYAKEHNNFITSLALTAGNI-GVILTPLI--L 

1 :llh : : II h = := I 1= I 1 = = h III =11 hi II =1 = 

SLCGCILFAVDSPILEFILVSNVFAIAMIFMPMMIHMIGHSLLIAITLRYALEDYATVTGTA- - -GSIFGAIYYWIASV 
300 310 320 330 340 350 360 



1365 1395 1419 1449 1479 1509 1539 1569 

TKLPSQLHLEPFMTPFLITSGLMVINV- - FVYLVIjyiSKNK*KyiRKDNFPRIVKVGEKMLIAKDKiQGI!JLINLLESHPGKjG 

I h = l I h I = :| I h 1= II 

TYCVSKIHGETISNFSLLCLVLSISSVISFYYICLLYKKKSIIIN 
380 390 400 



There is also homology to SEQ ID 400 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example2152 

A DNA sequence (GBSx2268) was identified in S.agalactiae <SEQ ID 6651> which encodes the amino 
acid sequence <SEQ ID 6652>. This protein is predicted to be 16S pseudouridylate synthase (rsuA). 
Analysis of this protein sequence reveals the following: 

Possible site: 52 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2645 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>6P:BAB0e992 GB:AP001518 16S pseudouridylate synthase [Bacillus halodurans] 
Identities = 106/234 (45%) , Positives = 141/234 (59%) , Gaps = 1/234 (0%) 



Query: 


1 


MRLDKLLGQAGFGSRNQVKKLICSRQVSVDGQIVTKDNVIVDSGLQSIFVGKERVCLKKS 


60 






MR+DK L GFGSR VKKL+ + V V GQ + + V+ +SI V E V K 




Sb j ct : 


1 


MRIDKFLaNMGPGSRKDVKKLLKTGAWVQGQPIKDPSTHVEPE^^^ 


60 


Query: 


61 


SYYLLYKPSGVVSaVRDSEHKWIDLISEKDKVBGLYPIGRLDRDTBGLLIVTNNGPLGY 


120 






Y ++ KP GV+ A D EH+TVIDL i E i -H- P+GRLD+DT GLL++TN+G + 




Sb j ct : 


61 


VYLMMNKPKGVICATEDLEHETVIDLLGEEERHYEPSPVGRLDKDTVGLLLITNDGKENH 


120 


Query: 


121 


RMLHPKHHVAKTYYVEVNGFLERDAITFFEEGWFDDGTKCKPAELTIDTANNDKSTARI 


180 






++ PKHHV KTY V G + + + F GW DDG KPA L 1 A +S + 




Sb j ct : 


121 


VMSPKHHVPKTYRALVEGHVTEEDVGAFSHGWLDDGYVTKPATLHILEA-GARSHIEL 


179 


Query: 


181 


TITEGKFHQVKKMFLAYGVKVIYLRRISFGDLRLDMNLKPGQYRRLRDSEAAIL 234 








+TEGKFHQVK+MF A G +V+ L RI G+L LD L G+YR L E A+L 




Sb j ct : 


180 


ILTEGKFHQVKRMFQAVGKRVLELERIKIGNLLLDPELARGEYRELTKEEIALL 233 





A related DNA sequence was identified in S.pyogenes <SEQ ID 6653> which encodes the amino acid 
sequence <SEQ ID 6654>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3310 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS protems is shown below. 

Identities = 111/194 (57%) , Positives = 138/194 (70%) 



Query: 


1 


miLDKLLGQftGFGSRNQVKKLICSRQVSVDGQIVTKDNVIVDSGLQSIFVGKERVCLKES 


60 






MRLDKLL GSR+QVKKLI ++ V VD VD GLQ I V +RV + 




Sbjct: 


1 


MRLDKLLEGTKVGSRSQVKKLIKAQGVWVDHMPARNGRQNVDPGLQLIEVTGQRVTHPKH 


60 


Query: 


61 


SyYLLYKPSGWSA\7RDSEHKTVIDLISEKDKVEGLYPIGRLDRDTEGriLIVTNNGPLGY 


120 






SY +L KPSGWSA +D+ + TVID ++E+DK LYP+GRLDRDTEGL+++T+NGPLG+ 




Sbjct: 


61 


SYIILNKPSGWSAKKDTNYLTVIDQLAEEDKSPDLYPVGRLDRDTEGLVLLTDNGPLGF 


120 


Query: 


121 


RMLHPKHHVAKTYYVEVNGFLERDAITFPEEGVVFDDGTKCKPAELTIDTANNDKSTARI 


180 






RMLHP HHV+KTY V VNG L DA FF G+ F 6 +C+PA+LTI A+ D+S A + 




Sbjct: 


121 


RMIiHPSHHVSKTYLVTVNGLLAEDASDFEAAGICFPTGEQCQPAQLTILKADTDQSQASL 


180 


Query: 


181 


TITEGKFHQVKKMF 194 








TI+EGKFHQVKK F 




Sbjct: 


181 


TISEGKFHQVKKCF 194 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2153 

A DNA sequence (GBSx2269) was identified in S.agalactiae <SEQ ID 6655> which encodes the amino 
5 acid sequence <SEQ ID 6656>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>>> Seems to have an uncleavable N-term signal seq 

Final Results 

10 bacterial membrane — Certaintyi=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — CertaintyssO . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9745> which encodes amino acid sequence <SEQ ID 9746> 
15 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA18872 GB:D90917 hypothetical protein [Synechocystis sp.] 
Identities = 197/318 (61%) , Positives = 243/318 (75%) 

20 Query: 22 MGLLVDGKWVDQWYDTASTGGKFVRTVTQFRHVOTKDGSAGPSGnAGFKAESGRYHLVVS 81 

MGLLV+G W DQWYDT STGG+FVR +QFRHW+T DGS GP+G GFKaE+GRYHLYVS 
Sbjct: 1 MGLLVNGIWCPQWYDTESTGGRFVRHDSQFRHWITPDGSPGPTGHGGFKAEAGRYHLYVS 60 

Query: 82 lACPWASRVLIMRKLKNLESHISISlVNPLMLENGWTFQEYKGVIPDMINQSQYLYQIYQ 141 
25 LACPWA R LI RKLK LE I +S+V+ LM ENGWTF GV+PD + ++YI1YQIY 

Sbjct: 61 LACPWAHRTLIFRKLKGLEGMIDVSVVHWLMRENGWTFAPGPGVMPDPLFNaEYLYQIYT 120 

Query: 142 ASQSDYTGRVTVPVLWDKKFHTIVNNESSEIMRMLm'AKNHITGNTDDYYPDSLQGQIDE 201 
+ + Y+GRVTVP+LWDK+ TIVMNESSEI+R+ N+AF+ + + DYYP +L+ QID 
30 Sbjct: 121 RMAQYSGRVTVPILVroKXJKQTIVNNESSElIRIENSaFDGLGftKSGDYYPKAIJRTQII^ 180 



35 



Query: 202 MNNFIYPKINNGVYKAGFATSQNVYQKEVETLFTALDQLEKHLSDNHYLVGEQFTEADIR 261 

+N+ lY INNGVYK GFAT+Q Y++ + LF +LD LE L + YL G++ TEAD R 
Sbjct: 181 LNDRIYHTINNGVYKCGFATTQTAYEEAIAPLFESLDWLEGILQGHQYLTGDEITEADWR 240 

Query: 262 LFTTLVRFDTVYYGHFKCISrLKAIJJDYPHLWHYTKRIYOTiPGIAETVNFDHIKKHYYGSHK 321 

LFTTL+RFD VY GHFKCNL+ + DYP+LW Y + +Y+ PGIAETVNF HIK HYY SH 
Sbjct: 241 LFTTLIRFDVVYVGHFKCNLRRIQDYElSn^WRYIiRDLYHQPGIAETVNFQHIKGHYYESHri 300 

40 Query: 322 TINPTGIIPAGPNLDWTI 339 

INPTGI+P GP LD ++ 
Sbjct: 301 NINPTGIVPMGPALDLSL 318 

No corresponding DNA sequence was identified in S.pyogenes. 

45 SEQ ID 6656 (GBS655) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 143 (lane 2-4; MW 27kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2154 

50 A DNA sequence (GBSx2270) was identified in S.agalactiae <SEQ ID 6657> which encodes the amino 
acid sequence <SEQ ID 6658>. Analysis of this protein sequence reveals the following: 

Possible site: 43 
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>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1116 (Affirmative) < suco 

5 bacterial mecnbrane — Certainty^O. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CaBl2030 GB:Z991C5 similar to glucosamine -6 -phosphate isomerase 
10 [Bacillus subtilis] 

Identities = 112/243 (46%) , Positives = 163/243 (66%) , Gaps = 10/243 (4%) 

MRVITVKNDIEGGKIAPTLLEEKMKAGAQT-LGtATGSSPITFYEEIVKS NLDFSN 55 

M+++ ++ E K++ +++E+++A LGIATGS+P+ Y++++ +DFS 

15 Sbjct: 1 MKILIAEtrffiELCKLSAAIIKEQIQAKKDaVLGrATGSTPVGLYEQLISDYQaGEIDFSK 60 

MVSim^DEWGIAASNDQSYSYFMHKHLFimKPFKENNL- -ENGIJUaDLKEEIKRYDAVI 113 
+ + NLDEY G++ S+ QSY++FMH+HLF + +++ P G L+ K Y+ +1 



Query: 


1 


Sbjct: 


1 


Query: 


56 


Sbjct: 


61 


Query: 


114 


Sb j ct : 


121 


Query: 


172 


Sbjct: 


181 


Query: 


231 


Sbjct: 


241 



A ID QILGIG NGHIGFNEPG+ F+ T W L+ STI+AN+RFF VP+ A 



+SMGI +IM+ SK IVL+A G EKA+AI M +GP+T D+PAS1LQKH+ V +1 D AA 



30 KL 

Sbjct: 241 QKL 243 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6659> which encodes the amino acid 
sequence <SEQ ID 6660>. Analysis of this protein sequence reveals the following: 

35 Possible site: 43 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.59 Transmembrane 174 - 190 ( 174 - 190) 

Final Results 

40 bacterial membrane Certainty=0 . 1235 (Affirmative) < suco 

bacterial outside — Certaintyi=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

45 >GP:CaB12030 GB:Z99105 similar to glucosamine -6 -phosphate isomerase 

[Bacillus subtilis] 
Identities = 120/244 (49%) , Positives = 162/244 (66%) , Gaps = 12/244 (4%) 

MKIIRVQDQIEGGKIAFTIiLKDSL-AKGAKTLGLATGSSPISFYQEMVKS PLDFSD 55 

50 MKI+ + E K++ ++K+ + AK LGLATGS+P+ Y++++ +DFS 

MKILIAEHYEELCKLaAAIIKEQIQaKKDAVLGLATGSTPVGLYKQLISDYQAGEIDFSK 60 



Query: 


1 


Sbjct: 


1 


Query: 


56 


Sbjct: 


61 


Query: 


113 


Sbjct: 


120 


Query: 


171 


Sb j ct : 


180 



LTSINLDEYVGLSVESDQSYDYFMRQNLF- - -NAKPFKKNYLENGLATDVEAEAKRYNQI 112 
+T+ NLDEY GLS QSY++FM ++LP N +P ++P G +EA K Y + 



I + ID Q+LGIG NGHIGFNEPG+ FE+ T W L ESTI+AN+RFF VP+ 



AISMGI +IM+ S+ IVLLA G+EKADAI+ M GP+T +PASII.QKH+HV VI D A 
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Query: 230 ASQL 233 
A +L 

Sbjct: 240 AQKL 243 

5 An alignment of the GAS and GBS proteins is shown below. 

Identities = 163/233 (69%) , Positives = 201/233 (85%) 

Query: 1 MRVITVKiroiEGGKIAFTLLEEKMKAGaQTLGIiATGSSPITFYEEIVKSNI^ 60 
M++I V++ IEGGKIAPTLL++ + GA+TLGLATGSSPI+FY+E+VKS LDFS++ SIN 
10 Sbjct: 1 MKIIRVQDQIEGGKIAFTLLKDSLAKGAKTLGLATGSSPISFYQEMVKSPLDFSDLTSIN 60 

Query: 61 LDEYVGIAASNDQSYSYFMHKHIjFDAKPFKENNLPMGLAKDLKEEIKRYDAVINANPIDF 120 

LDEYVG++ +DQSY YFM ++LF+AKPFK+N LPNGLA D++ E KRY+ +1 +PIDF 
Sbjct: 61 IBEYVGIiSVESDQSYDYFMRQNLENAKPFKKNYLPNGLATDVEAEAKRYNQIIAEHPIDF 120 

15 

Query: 121 QILGIGRNGHIGENEPGTPFDITTHWDLAPSTIEfiNSRFBTJSIDDVPKQALSMGIGSIM 180 

Q+LGIGRNGHIGFNEPGT F+ THWDL STIEANSRFF SI+DVPKQA+SMGI SIM 
Sbjct: 121 QVLGIGRNGHIGENEPGTSFEEETHWDLQESTIEANSRFFTSIEDVPKQAISMGIASIM 180 

20 Query: 181 KSKTIVLVAYGIEKAEAIASMIKGPITEDMPASILQKHDDWIIVDEAAASKL 233 

KS+ IVL+A+G EKA+AI M+ GPITE +PASILQKHD V++IVDEAAAS+I. 
Sbjct: 181 KSEMIVLLAFGQEKADaiRGMVFGPITEHLPASILQKHDHVIVIVDEAaASQL 233 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
25 vaccines or diagnostics. 

Example 2155 

A DNA sequence (GBSx2271) was identified in S.agalactiae <SEQ ID 6661> which encodes the amino 
acid sequence <SEQ ID 6662>. Analysis of this protein sequence reveals the following: 

Possible site: 61 
30 >» Seems to have no N-terminal signal sequence 



45 



INTEGRAL 


Likelihood = 


-8. 


.12 


Transmembrane 


169 - 


185 


( 161 


- 194) 


INTEGRAL 


Likelihood = 


-6. 


,37 


Transmembrane 


151 - 


167 


( 145 


- 168) 


INTEGRAL 


Likelihood = 


-5. 


,15 


Tremsmembrane 


42 - 


58 


( 41 


- 62) 


INTEGRAL 


Likelihood = 


-1, 


.59 


Transmembrane 


207 - 


223 


( 207 


- 224) 


INTEGRAL 


Likelihood = 


-1. 


.12 


Transmembrane 


24 - 


40 


( 23 


- 40) 



35 

Final Results 

bacterial membrane Certainty=0 . 4248 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



>GP:AAF13747 GB:AF117351 unknown [Zymomonas mobilis] 
Identities = 88/216 (40%) , Positives = 123/216 (56%) 

Query: 9 QQLNILRAGVLGANDGIISVAGWIGVASATHNLWIIFLSAASAILAGAFSMAGGEYVSV 68 

+Q+ LRA VLGANDGI+S + ++IGVASA + I L+ S ++AGA SMA GEYVSV 
Sbjct: 17 RQMGWLRASVLGANDGILSTSSIMIGVaSAHGSSGNILLAGMSGLIAGALSMAAGEYVSV 76 

50 Query: 69 STQKDTEQAAVAREEKLLENNPELAKKSLVDIYLAKGESHEHAQWLVDKAFSKNAIEHLV 128 

S+Q D EQA VARE L+ NP K L +IY+ +G E A + ++ + NA+E + 
Sbjct: 77 SSQHDMEQADVAREHAELKANPHAEKHELAEIYVERGLDRELALQVAEQLMAHNALEAHL 136 

Query: 129 EEKYGIEFGEYTSPWHAAISSFIAFAIGSIFPTITILLLPFSVRIVGTVIIVIVSLLSTG 188 
55 ++ 6+ P AA++S I+F+ 6+1 P +T L P + + +1 1+ L G 

Sbjct: 137 RDELGLTDSLIARPVQAALASAISFSGGAIVPFLTALFSPPEIINITISLISILCLAVLG 196 

Query: 189 YVSAKLGQAPTVPAMRRNVMIGCLTMLATYVIGQLF 224 
VALGA AR GLM+TIGF 
60 Sbjct: 197 MVGAHLGGANVPKAALRVTFOSALAMIGTAAIGSFF 232 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that tihds protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2156 

5 A DNA sequence (GBSx2272) was identified in S.agalactiae <SEQ ID 6663> which encodes the amino 
acid sequence ' <SEQ ID 6664>. This protein is predicted to be S-adenosylmethionine tRNA 
ribosyltransferase (queA). Analysis of this protein sequence reveals the following: 

Possible site: 41 

»> Seems to have no N-terminal signal sequence 

10 

Final Results 

bacterial cytoplasm Certainty^O. 3438 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

15 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CaB14732 GB:Z99118 S-adenosylmethionine tRNA ribosyltransferase 
[Bacillus subtilis] 
Identities = 228/341 (66%) , Positives = 279/341 (80%) 

20 . 



Query: 


1 


MWTNDFDFYLPEELIAQTPLEKRDASKLLVIDHKNKTMTDSHFDHILDELKPGDALVMNN 


60 






M + FDF LPE LIAQ PLE+RDAS+L+V+D +TDS F HI+ GD LV+NW 




Sb j ct : 


1 


MKVDLFDFELPERLIAQVPLEQRDASR]M\n:iDKHTGELTDSSFKHIISFFNEGDCLVUmi 


60 


Query: 


61 


TRVLPARLYGEKQDTHGHVELLLLKNTEGDQWEVLAKPAKRLRVGTKVSFGDGRLIATVT 


120 






TRVLPARL+G K+DT VELLLLK GD+WE liRKPAKR++ GT V+FGDGRL A T 




Sb j ct : 


61 


TRVLPARLFGTKEDTGAKVELLLLKQETGDKWETLAKPAKRVKKGTWTFGDGRLKAICT 


120 


Query: 


121 


KELEHGGRIVEFSYDGIFLEVLESLGEMPLPPYIHEKLEDRDRYQTVYAKENGSAAAPTA 


180 






+ELEHGGR +EF YDGIF EVLESLGEMPLPPYI E+L+D++RYQTVY+KE GSAAAPTA 




Sbjct: 


121 


EELEHGGRKMEFQYDGIFYEVLESLGEMPLPPYIKEQLDDKERYQTVYSKEIGSAAAPTA 


180 


Query: 


181 


GLHFTKELLEKIETKGVKLVYLTLHVGLGTFRPVSVDNLDEHEMHSEFYQLSKEAADTLN 


240 






GLHFT+E+L++++ KGV++ ++TLHVGLGTFRPVS D ++EH MH+EFYQ+S+E A liM 




Sbjct: 


181 


GLHFTEEILQQLKDKGVQIEFITLHVGLGTFRPVSftDEVEEHNMHAEFYQMSEETAAALN 


240 


Query: 


241 


AVKESGGRIVAVGTTSIRTLETIGSKFNGELKADSGWTNIFIKPGYQFKWDAFSTNFHL 


300 






V+E+GGRI++VGTTS RTLETI + +G+ KA SGWT+IFI PGY+FK +D TNFHL 




Sbjct: 


241 


KVRENGGRIISVGTTSTRTLETIAGEHDGQFKASSGWTSIFIYPGYEFKAIDGMITNFHL 


300 


Query: 


301 


PKSTLVMLVSAFAGRDFVLEAYMHAVEERYRFFSFGDAMFV 341 








PKS+L+MLVSA AGR+ +L AYNHAVEE YRFFSFGDAM + 




Sb j ct : 


301 


PKSSLIMLVSALAGRENIIiRAYMHAVEEEYRFFSFGDAMLI 341 





45 A related DNA sequence was identified in S.pyogenes <SEQ ID 6665> which encodes the amino acid 
sequence <SEQ ID 6666>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

>» Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 3864 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

55 An alignment of the GAS and GBS proteins is shown below. 



Identities = 297/341 (87%) , Positives = 322/341 (94%) 
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Query: 1 ^I[^mroFDFYLPEELIAQTPLEKRDASKLLVIDHKNKTMTDSHFDHILDEIiKPGDALVM^^ 60 

MNTN+FDF LPEELIAQTPLEKRD+SKLL+IDH+ KTM DSHFDHI+D+L PGDALVMKN 
Sbjct: 1 MNTNNFDFELPEELIAQTPLEKRDSSKLLIIDHRQKTMVDSHEDHIID^^ 60 

5 Query: 61 TRVLPARLYGEKQDTHGHVELLLtKtOTGDQWEVLaKPAKRIjRVGTKVSFGDGRLlAT^ 120 

TRVLEftRLYGEK OTHGHVEIJiI)LKOT+GDQWEVLaKPAKRL+^ AT+ 
Sbjct: 61 TRVLPARLYGEKPDTHGHVELLLLKNTQGDQimVIJUCPAKRLKVGSQWFGDGRLKAT 120 

Query: 121 KELEHGGRIVEFSYDGIFLEVLESLGEMPLPPYIHEKLEDRDRYQTVYAKENGSAAAPTA 180 
10 ELEHGGRIVEFSYDGIFLEVLESLGEMPLPPYIHEKLED +RYQTVYAKENGSAAAPTA 

Sbjct: 121 DELEHGGRIVEFSYXXSIFLEVLESLGEMPLPPYIHEKLEDaERYQTVYAKENGSAAAPTA 180 

Query: 181 GIJIFTKELLEKIETKGVKLVYLTLHVGI/STFRPVSVDNLDEHEMHSEFYQLSKEAADTLN 240 

GLHFT +LL+KIE KGV LVYLTLHVGLGTFRPVSVDNLDEH+MHSEFY LS+EAA TL 
15 Sbjct: 181 GLHFTTDLLKKIEAKGVHLVYLTLHVGLGTFRPVSVDNLDEHDMHSEFYSLSEEAAQTLR 240 

Query: 241 AVKESGGRIVAVGTTSIRTLETIGSKFNGELKADSGWTNIFIKPGYQFKVVDAFSTNFHL 300 

VK++GGR+VAVGTTSIRTLETIG KF G+++ADSGWTNIFIKPGyQFKWDAFSTNFHri 
Sbjct: 241 DVKQRGGRVYAVGTTSIRTLETIGGKFQGDIQADSGWTNIFIKPGYQFKVVDAFSTNFHL 300 

20 

Query: 301 PKSTLVMLVSAFAGRDFVLEAYMHAVEERYRFFSFGDBMFV 341 

PKSTLVMLVSAFAGRDFVLEAY HAV+E+YRFFSFGDAMFV 
Sbjct: 301 PKSTLVMLVSAFAGRDFVLEAYRHAVDEKYRFFSFGDAMFV 341 

25 Based on this analysis, it was predicted that these proteins and their epitopes could be usefiii antigens for 
vaccines or diagnostics. 

Example 2157 

A DNA sequence (GBSx2273) was identified in S.agalactiae <SEQ ID 6667> which encodes the amino 
acid sequence <SEQ ID 6668>. Analysis of this protein sequence reveals the following: 

30 Possible site: 36 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-14.22 Transmetnbrane 14 - 30 ( 6 - 34) 

Final Results 

35 bacterial membrane Certainty=0 . 6689 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < succ> 

The protein has no significant homology with any sequences in the GENPEPT database. 

40 A related DNA sequence was identified in S.pyogenes <SEQ ID 6669> which encodes the amino acid 
sequence <SEQ ID 6670>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

»> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0 . 2655 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certaintyi=0 . 0000 (Not Clear) < suco 

50 An alignment of the GAS and GBS proteins is shown below. 

Identities = 126/195 (64%) . Positives = 155/195 (78%) , Gaps = 1/195 (0%) 

Query: 160 MEERFDITETDYEYIGEHHNYVAAFSGAMSIDDMQKYSLVYSENTPAYALAERIGGMDSA 219 
M ERFDITETDYEY EH+ YVA F+GAMSI DMQ+YSLVYSElSITPAYALaER+GGM+ A 
55 Sbjct: 1 OTERFDITETDYEYrCEHHAYVAQENGflMSIPDMQEYSLVYSENTPAYALaERLGGMNKA 60 



Query: 
Sbjct: 



220 YSKFGRYGQSKGDIKNIQKNGNKVTTDYYIQVLDYLWKHRKKYDSLITYLEEAFPTDYYR 279 

Y F RYG+ Gil +NGNK+TT Yy+QVLDYLW+H+ KY ++ Y+ E+FP YY+ 
61 YQLFDRYGKVSGAITTIDRNGNKITTAYYLQVLDYLWQHQDKYKDILYYIGESFPDLYYK 120 
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Query: 280 ALIPSDVWAQKPGYWEALNVGAIVKEEVPYIVAIYTAGLGGSTQEDSEINGVGLYQLE 339 

+P V V QKPGYVREAUWGAIV EE PY++A+Y++GLGG+TQ E+NG+G QL 
Sbjct: 121 TYLP-HVKVYQKPGYWEAIiWGAIVCEESPYLIALYSSGLGGATQASEEVNGLGYVQLV 179 

5 

Query: 340 QLCFVINQWHRVNMN 354 

QL +VIN+W+R N+N 
Sbjct: 180 QLPYVINEWYRGNUSI 194 

10 SEQ ID 6668 (GBS680) was expressed in E. coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 164 (lane 10-12; MW 64kDa) and in Figure 239 (lane 9; MW 64 kDa). It was 
also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 
164 (lane 15; MW 40kDa) and in Figure 188 (lane 9; MW 40kDa). Purified GBS680-His is shown in Figure 
242, lane 8. Purified GBS680-GST is shown in Figure 246, lanes 6 & 7. 

15 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2158 

A DNA sequence (GBSx2274) was identified in S.agalactiae <SEQ ID 667 1> which encodes the amino 
acid sequence <SEQ ID 6672>. Analysis of this protein sequence reveals the following; 

20 Possible site: 17 

»> Seems to have no N-terminal signal secpience 



INTEGRAL 


Likelihood = 


-4, 


.57 


Transmembrane 
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Final Results 

bacterial membrane CertaintyteO. 2826 (Affirmative) < suco 

30 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty^O. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identifi^ed in S.pyogenes. 

35 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2159 

A DNA sequence (GBSx2275) was identified in S.agalactiae <SEQ ID 6673> which encodes the amino 
acid sequence <SEQ ID 6674>. Analysis of this protein sequence reveals the following: 

40 Possible site: 59 

>» Seems to have an uncleavable N-term signal seq 
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Final Results 
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bacterial membrane Certainty=0 .4949 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial ■ cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the iaENPEPT database. 

>GP:flAC21770 GB:U32694 H. influenzae predicted coding region HI0092 
[Haemophilus influenzae Rd] 
Identities = 232/416 (55%) , Positives = 314/416 (74%) , Gaps = 3/416 (0%) 

Query: 4 TFTTTGaLI6LftLAILLIIKKVHPAYSLIIX3ftLVGGLIGGGDLVTIVNT^W^ 63 

T + GaL+ L +AI LI+KKV PAY +++GaLVGGLIGG DL V+ M+ GAQG+ ++ 
Sbjct: 3 TVSAIGALVMiIVRlFLILKKVSPAYGMLVGALVGGLIGGADLSQTVSLMIGGRQGITTA 62 

Query: 64 ILRILTSGILAGALIKTGSAEKIAESIIKKLGQQRAITALAIATMIICAVGVFIDIAVIT 123 

++RIL +G+LAG LI++G+A I E+I IvLG+ RA+ ALA+ATMI+ AVGVF+D+AVIT 
Sbjct: 63 VmiLAAGVLAGVLIESGAMSITETITNKLGETRftLLAIiAIATMILT^^ 122 

Query: 124 VAPIAIAIGKKANLSKSSILLftMIGGGKAGNIISPNPimiAASEAFKVDLTSI^QN^ 183 

V+PIALA+ ++++LSK++ILLfimGGGKaGNI+SPNI>N IAA++ F + LTS+M+ IIP 
Sbjct: 123 VSPIALALSRRSDLSKAAIIiIiAMIGGGKACailMSPNPNAIAAADTFHLPLTSVMMAGIIP 182 

Query: 184 AIAALWTIII1AKIVSKKNNDISYDSEEQV--GSDLPAFLPAISGPLWICLLALRPLFG 241 

A+ L++T LAK + K + ++ D E V +LP+FL A+ PLV I LLALRPLF 
Sbjct: 183 ALFGLILTYFLAKRLINRGSKVT-DKEVIVLETQNLPSFLTALVAPLVAILLLMi^ 241 

Query; 242 ITIDPIiIALPLGGLISIIATGYLKETTOFVEYGLSKVViGVSIIirilGTGTLSGIIKaSKILQ 301 

I +DPLIALPLGGLI G L+ + GLSK+ V+I+L+GTG L+GII S L+ 

Sbjct: 242 IKVDPLIALPLGGLIGAFCMGKLRNINSYAINGLSKMTPVAIMLLGTGRLAGIIANSGLK 301 

Query: 302 FDMIHLLEFLNMPTFILAPLSGIFMGAATASTTSGTTIASQTFAETLIKSGVPAVSGAAM 361 

+1 LE +P++ILAP+SG+ M ATASTT+GT +AS F+ TL++ GV +++GAAM 
Sbjct: 302 EVLIQGLEHSGLPSYILAPISGVLMSLATASTTAGTAVASNVFSSTIiLELGVSSLAGAftM 361 

Query: 362 IHftGATVLDSLPHGSFFHATGGAVNMAIKDRMKLISYEALIGLTSTIVAVVYYCFF 417 

IHRGATV D +PHGSFFHATGG+VNM IK+R+KLI YE+ +GL TIV+ + + F 
Sbjct: 362 IHRGAWFDHMPHGSFFHATGGSVNmIKERLKLIPYESAVGL^(IMTIVSTLIFGVF 417 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6675> which encodes the amino 
sequence <SEQ ID 6676>. Analysis of this protein sequence reveals the following: 

Possible site: 51 
>» Seems to have an imcleavable N-term signal seg 
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Pinal Results 

bacterial membrane Certainty=0. 5458 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB07616 GB:AP001520 unknown conserved protein [Bacillus halodurans] 
Identities = 155/435 (35%) , Positives = 248/435 (56%) , Gaps = 21/435 (4%) 



Query: 7 LGVLVGVIVIIYLYVKEVNIIIAAPLATSLVILENQMDPTTTLLGKEPNQFMGM.STYIL 66 
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LG+++G+++++ L + +11 AP+A +V LF +D hh + +M + 
Sbjct: 2 LGIVI/3LVIIMVIAYRGWSIIWVRPIAAGVVaLFGGLD----LLPAYTDTYMEGFVNFAK 57 

Query: 67 NYFAIFLLGSILAKLMETSGATTSIADYILKKVGHDSPYKVLVAIFLISAILTYGGISLF 126 

+F +F+LG+I KLME +GA S+A I K +G + ++ + L A+LTYGGISLF 

Sbjct: 58 QWFPVFMIK3AIFGKLIffiDTGaaRSVASAITKLIGTK---RAILGV^aX3CAVLTO 114 



Query: 127 VVMFAVLPLARSLFKKMDLAWNLIQVPLWLGIATFTMTILPGTPAIQNVIPIQYLDTSLT 186 

W+FA+ PLA +LF++ +++ LI + LG TFTMT +PGTP IQN+IP Y T+ 
Sbjct: 115 WVFAMYPIAIjy^FREANISRRLIPGTIAIX3AFTFTMTAVPGTPQIQNLIPTSYYGTNAM 174 

Query: 187 AAAIPSIVGSIGCVAFGLFYMKYCLAKSMARGETYATYAFDNEIQVKTKNLPHFLASILP 246 

AA+++++ GY++ K GE + T +E + + + +P+ S LP 

Sbjct: 175 AAP^mGVIAaLIMGIGGYTYLVWREKKLKEflGE-FFTEPKNGEKEEEGEKVPNPWLSFLP 233 

Query: 247 LLLLIIIALTGSLFGNDFFKKNIIFIALLAVILTASWLFRQFIPNKIAVFNLGASSSIAP 306 

L+ +1+ T +L D I +AL++ 1+ L + I N GA S+ 

Sbjct: 234 LVSVIV---TnNLLQWD IVLALISGIVLIMLiaWGKVKGFIQSMNQGAGGSVLA 284 

Query: 307 IFATASAVAFGAVWIVPGFTFFSDLimiPGNPLISLAVLTSSMSAlTGSSSGALGIVM 366 

I T++AV P6+W VPGF ++L+L I G+PLIS AV + ++ TGS+SG +GI + 
Sbjct: 285 IINTSAAVGPGSVVRAVPGFERLTELLLGIQGSPLISQAVAINVLAGATGSASGGMGIAL 344 

Query: 367 PNFAQYYLDQGLNPEMIHRVATIASNIFTIVPQSGVFLTFLALTGLNHKNAFKETF 422 

+ Q ++ G++PE HRVA+IAS +P +G LT LA+TGL+HK ++K+ F 

Sbjct: 345 EALGDRYMQLAMETGMSPEAFHRVASIASGGLDTLPHNGAVLTLLAITGLSHKESYKDIF 404 

Query: 423 ITVSVSTFIAQVIVI 437 

+ V ++ I 
Sbjct: 405 WGCVIPIVSVAFAI 419 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 88/395 (22%) , Positives = 167/395 (42%) , Gaps = 40/395 (10%) 

Query: 9 GALIGLALAILLIIKKVHPAYSLILGALVGGLIGGGDLVTIV NTMVLGAQG- -MMS 62 

G 11+0+ + I L +K+V+ + L + L D T + +GA +++ 

Sbjct: 8 GVLVGVIVIIYLYVKEVNIIIAAPLATSLVILFNQMDPTTTLLGKEPNQFMGALSTYILN 67 

Query: 63 SILRILTSGIIiAGALIKTGSAEKIAESIIKKLGQQ---RAITALAIATMIICAVGVFIDI 119 

L ILA + +G+ IA+ I+KK+G + + A+ + + 1+ G+ + + 

Sbjct: 68 YFAIFLIXKILAKUIETSGATTSIADYILKKVGHDSPYKVLVAIFLISAILTYGGISLFV 127 

Query: 120 AVITVAPIALAIGKKftNLSKSSILLAMIGGGKAGNII SENENTIAASEAFKVDLTS 175 

+ V P+A ++ KK +L+ + I + +G + +P ++ LT+ 

Sbjct: 128 VMFAVLPIJffiSLFKKM)IAWNLIQVPLVEiGIATFTOTILP6TPAIQ[WIPIQYLDTSLT^ 187 

Query: 176 LMVONIIPAIAALVVTII LAKIVSKKN1S1DISY--DSEEQVGS-DLPAFLPAISGP 227 

+ +1+ +1 + + LAK +++ +Y D+E QV + +LP FL +1 

Sbjct: 188 AAIPSIVGSIGCVAFGLFYMKyOiAKSMREGETYATYAFDlffilQiVKTKm 247 

Query: 228 LWICLLALRPLPG ITIDPLIALPLGGLISILATGYLKETVPFVEYGLSICWG 280 

L++I + LFG I L+A+ L SL+++ GS + 

Sbjct: 248 LLLIIIALTGSLFGNDFFKKNIIFIALLAVIL--TASWLFRQFIPNKIAVFNLGASSSIA 305 

Query: 281 VSILLIGTGTLSGIIKASNLQFDMIHLLEFIJSIMPTFILAPLSGIFMGAATASTTSGT 337 

+ + G + 1+ D+I L P LA L+ MAT S++ 

Sbjct: 306 PIFATASAVAFGAVVMIVPGFTFFSDLI--LNIPGNPLISLAVLTS-SMSAIT6SSSGAL 362 

Query: 338 TIASQTFAETLIKSGVPAVSGAAMIHAGATVLDSL 372 

I FA+ + G+ MIH AT+ ++ 

Sbjct: 363 6IVMPNPAQYYLDQGL NPEMIHRVATIASNI 393 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens 
vaccines or diagnostics. 
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Example 2160 

A DNA sequence (GBSx2277) was identified in S.agalactiae <SEQ ID 6677> which encodes the amino 

acid sequence <SEQ ID 6678>. Analysis of this protein sequence reveals the following: 

Possible site: 21 
5 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.24 Transmembrane 85 - 101 ( 84 - 101) 

Final Results 

bacterial membi-ane Certainty=0 . 2296 (Affirmative) < suco 

10 bacterial outside Certainty=0. 0000 (Not Clear) < succ> 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB16041 GB:Z99124 similar to hypothetical proteins [Bacillus subtilis] 
15 Identities = 176/377 (46%), Positives = 234/377 (61%), Gaps = 2/377 (0%) 

Query: 1 MKVWAIDSLKGSLSSLEAGNAIKESINEVISGADVEVHPLADGGEGTVEALTLGMGGTI 60 

MK+++A DS K SLS+LEA AI+ V GAD P+ADGGEGTV++L G I 

Sbjct: 1 MKIIIAPDSFKESLSALEAAEAIERGFKSVFPGADYRKLPVADGGEGTVQSLVDATNGRI 60 

20 

Query: 61 ETIPVKGPLGEKVHASYGIIPQRQIiAIIEMAAAAGITLIATEERNPLHTTTyGVGEMIKD 120 

V GPLGE V A +G++ + A+IEMAAA+G+ L+ ++RNPL TTT G QE+I 
Sbjct: 61 lEQVVTGPLGEPVRAFFGMMGDGRTAVIEMAAASGLHLVPVDKRNPLITTTRGTGELIGA 120 

25 Query: 121 AISKBCRHFIIGIGGSATI!roGGAG^D^QAIfiVaLLDK^NQEISLGAQGLADLKSISTDKVI 180 

A+ 6 II6XGGSATNDGGA6M-I-QALG LLD EI G L+ L SI + 
Sbjct: 121 ALDAGAERLIIGIGGSATISroGGAGMIQALGGRLLDNSGSEIGPGGGALSQLASIDVSGLD 180 

Query: 181 EELKECDFKIACDVTNPLCGAQGCSSIFGPQKGADEDMITKMDTWLSNYATLATSVSEKA 240 
30 L+ ++AC+V NPL G +G +++FGPQKGA DM+ +D +S++A +A 

Sbjct: 181 SRLRWKLEVAGNVDNPLTGPKGATAVFGPQKGATADMLDVLDQNVSHFADMAEKALGST 240 

Query: 241 DATIEGTGAAGGLGFAFLAFTNATLEPGIDIILSEINIEKAISEADLWTGEGRLDGQTV 300 
EG GaAGGIiG++ L + A L+ GIDI+L ++ E + +ADLV+TGEGR+D QTV 
35 Sbjct: 241 FRDTEGAGAAGGLGWSLLTYLQADLKRGIDIVLEAVDFESIVQDADLVITGEGRIDSQTV 300 

Query: 301 MGKAPIGVAKLAKKYGKKWAFSGSVTEDAILCNQHGIDAFFPIVRRLISLDEAMSKEVA 360 

GK PIGVAK AK Y V+ +GS++ D+ QH6IDA F IV + L++A 
Sbjct: 301 HGKTPIGVAKAAKSYDVPVIGIAGSISRDSNAVYQHGinaLFSIVPGAVPLEDAFEHAAE 360 

40 

Query: 361 YKNMKETATQVFRLINL 377 

Y M+ TA + I L 
Sbjct: 361 Y--MERTARDIAASIKL 375 

45 A related DNA sequence was identified in S.pyogenes <SEQ ID 6679> which encodes the amino acid 
sequence <SEQ ID 6680>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>» Seems to have no N-terminal signal sequence 
50 INTEGRAL Likelihood = -0.27 Transmembrane 360 - 376 ( 360 - 376) 

Final Results 

bacterial membrane Certainty=0 . 1107 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

55 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAA57927 GB:U18997 ORF_f408 [Escherichia coli] 
Identities = 115/345 (33%) , Positives = 182/345 (52%) , Gaps = 25/345 (7%) 

€0 

Query: 24 MKILVAIDSFKGSVTSPELNTSVAQALLSVDKQLVIETRAIADGGEGSLVALSQTVAGRW 83 
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MKI++A DS+K S+++ E+ ++ + + + +ADGGEG++ A+ ' G 

Sbjot: 28 MKIVIAPDSYKESLSASEVAQAIEKGFREIFPDAQYVSVPVMJGGEGtVERMIiy^TQGRE 87 

Query:. 84 HQVKTIDLLRRPIKVAY--YRHAKQAFIESASIIGIDKITSNSVTYAQATSYGLGIiAVKD 141 
5 ' L + ++ K AFIE A+ G++ + + TS G G + 

Sbjct: 88 RHAVmGPLGEKVNASWGISGDGKTAFIEMAAASGLELVPAEKRDPLVTTSRGTGELILQ 147 

CJuery: 142 AIQKGATQIEIMLGGTGTSDGGKBFLESUraDFMT - - -GRSYLDTIASPVTLLGL 193 

A++ GAT I I +GG+ T+DGG G G L+TL + + + GL 

10 Sbjct: 148 ALESGATNIIIGIGGSATNDGGAGMVQRLGAKLCnftNGNEIGFGGGSIOT 206 

Query: 194 T DVTNPYHGPQGFAAVFGPQKGGSLSQIEETDQIASNFAKKVFCQTTI 241 

DVTNP G G + +FGPQKG S + I E D S++A+ + + 
Sbjct: 207 DPRLKDCVIRVACDVTNPLVGDNGASRIFGPQKGASEftMIVEIJ)MILSHYAEVIKKALHV 266 

15 

Query: 242 DLQTIPGSGaaGGLGGAIV-I.LGGTLTSGPSRIAELIJJIinNSLQSa3LVITGE^ 300 

D++ +PG+GaaGG+G A++ LG L S6 + IJSIL+ + C LVITGEG +D+QS 
Sbjct: 267 DVKDVPGAGAAGGMGRAimFIKaRELKSGIEIVTTAIJSmEEHIHDCTL 326 

20 Query: 301 QSGKVPVAIARMAKKYQVPTIALCGSVKIETGLAAEDFL-AVFSI 344 

GKVP+ +A +AKKy P I + GS+ + G+ + + AVFS+ 
Sbjct: 327 IHGKVPIGVaUVAKKYHKPVIGIAGSLTDDVGWHQHGIDAVFSV 371 

An alignment of the GAS and GBS proteins is shown below. 

25 Identities = 128/379 (33%), Positives = 194/379 (50%), Gaps = 23/379 (6%) 

Query: 1 MKVVVAIDSLKGSLSSLEAGNAlKESIlffiVISGADVEVHPIiaDGGEGTVEALTL(31GGTI 60 

MK++VAIDS RGS++S E ++ +++ V +E +ADGGEG++ AL+ + G 

Sbjct: 24 MKILVAIDSFKGSVTSPELNTSVaoaLLSVDKQLVIETRAIADGGEGSLVALSQTVAGRW 83 

30 

Query: 61 ETIPVKGPLGEKOTASYGIIPQRQLAIIEMAftAAGITLIATEERNPUITTTYGVGEMIKD 120 

+ L + +Y + A IE A+ GI I + T+YG+G +KD 

Sbjct: 84 HQVKTIDLLRRPIK\aY--YRHaKQAFIESASIIGIDKITSNSVTyAQATSYGLGIAVKD 141 

35 Query: 121 AISKGCRHFII6IGGfiAT^^X3GAGMLQAMYAIJ:iDKDNQEISLGAQ6IlADLKSISTDKVI 180 
AI KG I +GG+ T+DGG G L++I1 Y + G + L ++++ + 

Sbjct: 142 AIQKGATQIEIMLGGTGTSDGGKGFLESUSIYDFMT GRSYLDTLASPVTL 190 

Query: 181 EELKECDFKIACDVTNPLCGAQGCSSIFGPQKGADEDMITKMDTWLSNYATLATSVSEKA 240 
40 I) DVTNP G QG +++FGPQKG I + D SN+A + 

Sbjct: 191 LGLT DVTNPYHGPQGFAAVFGPQKGGSLSQIEETDQIASNFAKKVFCQTTID 242 

Query: 241 DATIEGTGAfiGGLGFAFLAFTNATLEPGIDIILSEINIEKAISEftDLVVTGEGRLDGQTV 300 
TI G+GAAGGLG A + TL G I +N++ ++ DLV+TGEG LD Q+ 
45 Sbjct: 243 LQTIPGSGA!«3GLGGA-IVIjLG6TLTSGFSRIAELIJSr™SLQS<^^ 301 

Query: 301 MGKAPIGVAKLAKKYGKKWAFSGSVTEDAILCNQHGIDAFFPIVRRLISLDEAMSKEVA 360 

GK P+ +A++AKKY +A GSV + L + +AFI++ ISL+ A+ K 
Sbjct: 302 SGKVPVAIARMAKKYQVPTIALCGSVKIETQLAAEDPL-AVFSIQQQPISLEftAIDKrTT 360 

50 

Query: 361 YKNMKETATQVFRLINLYN 379 

N+K A + LI +N 
Sbjct: 361 LSisriKILAaNLMLLIAQFN 379 

55 SEQ ID 6678 (GBS409) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 76 (lane 7; MW 45.4kDa). 

GBS409-His was purified as shown in Figure 214, lane 6. 



60 



GBS409d was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 166 (lane 3 & 4; MW 35kDa) and in Figure 188 (lane 12; MW 35kDa). Purified protein is 
shown in Figure 240, lanes 9-10. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2161 

A DNA sequence (GBSx2278) was identified in S.agalactiae <SEQ ID 6681> which encodes the amino 
5 acid sequence <SEQ ID 6682>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>» Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certaintyi=0.1886 (Affiinnative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:AAC21771 GB:U32695 conserved hypothetical protein [Haemophilus influenzae Rd] 

Identities = 97/383 (25%) , Positives = 175/383 (45%) , Gaps = 52/383 (13%) 

Query: 1 MKLRKQLAQQIVTSIiOJVCQQDINFINTRGIIFASTNPKRVGEFHEIGLKVAQTGQMIEV 60 

M+L K A++IV + +N ++ G+I AS N R+ + H + + +++E+ 

20 Sbjct: 1 MQLDKYTAKKIVKRAMKIIHHSVNVMDHDGVIIASGNSTRLNQRHTGAVLALRENRWEI SO 

Query: 61 TD---QESYFGTQAGINIPFYyNCEL]:ATIGISGNPNQVGKYALIiAQKMTRLILKEHE-L 116 

Q+ F Q GIN+P +Y + + +GISG P QV +yA L + LI+++ L 
Sbjct: 61 DQAI^KWNPEAQPGINLPIira^KNIGVVGISGEPTQVKlQYAELVKMTAEIiIVEQQRLI. 120 

25 

Queiy: 117 DYLDFGRKNEASIVLHHLVEGRELDYyYLNQFLNQYHLSEKTDYRLLTFEINSQKQKLLL 176 

+ + R+ + +Ij L+ LN + ++ + +F++N + +L+ 

Sbjct: 121 EQESWHRRYKEEFILQ LIfflCNIiNWKEMEQQa.--KFFSFDGNKSRVVVLI 167 

30 Query: 177 S -QSEMSLLNFKDK--- -— LDTAIYTENYPNQYWLLLSDHMFDYYYPNI 219 

+ +L+N+ ++ LD + + N +LS M 
Sbjct: 168 KmiPA[jDNLQNLINYLEQSEFAQDVAILSLDQVVVLKTWQNS--TVLSAQM KT 219 

Query: 220 LSKFECEKiGrjYKVGIGQKSSLSLLKR---SYETSILALK-ALKGQQK- -VNLVDDLDLEL 273 

35 L + K YK+ +G +L L ++ S++++ L LK + + + D+ L + 

Sbjct: 220 LLPADYSKQDYKIAVGACLNLPLFEQLPLSFQSAQSTLSYGLKHHPRKGIYVFDEHRLPV 279 

Query: 274 LLTSIDSNIKQYVimaLVNL-SENDKIL---IJiISYFKHNLSLKECSQELFIHKNTVQYR 329 
LL+++ LKLLSE+IL LYFNL +++LF+H NT++YR 
40 Sbjct: 280 LLAGLSHSWQGNELIKPLSPLFSEENAILYKTLQQYFLSNCDLYLTAEKLFVHPNTIiRYR 339 

Query: 330 LNKIYESTQUJPRUFKDATLLYL 352 

USKI + T L D LYL 

Sbjct: 340 INKIEQITGLFFNKIDDKLTLYIi 362 

45 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2162 

50 A DNA sequence (GBSx2279) was identified in S.agalactiae <SEQ ID 6683> which encodes tiie amino 
acid sequence <SEQ ID 6684>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

»> Seems to have no N-terminal signal sequence 

55 Pinal Results 

bacterial cytoplasm Certainty=0. 0290 (Affirmative) < suco 
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bacterial membrane Certainty=0 . OQOO (Nofc Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF89979 GB:AF206272 beta-glucosidase [Streptococcus mutans] 
Identities = 334/475 (70%) , Positives = 392/475 (82%) , Gaps = 8/475 (1%) 



Query: 



4 



FPKHFLWGGAVAANQVEGAFRTDGKGLSVQDVLPNGGLGD' 
FP++FI.WGGA AftNQ EGA+ DGKGLSVQDV P GG+ 



FTAKPTPDNLKLE 56 
T KPT DNLKL 



Sbjct: 6 FPENFLWGGATAftNQFEGAYNQIXSRGLSVQDVTPKGGVaQSGSSSPLITEKPTEDNLKLV 65 

Query: 57 AIDFyHNyK!!roiKI.FAEMGFBCVFRTSIAWSRIFENGDDSAPNEAGrOFYDNLFDELL^ 116 

IDFy+ YK DI LFAEMGFKVFR SIAW+RIFPNGDD PNEAGL FYD +FDEL Ky+ 
Sbjct: 66 GIDFYmYKEDIALFAEMGFKVFRLSIAWTRIFPNGDDLEPJSffiAGriAFYDKVFDELAKYD 125 

Query: 117 lEPLWLSHYETPIJIIJUCrYNGVaDRRLIAFFEKFaQTVMERYKDKVKX^ 176 

IEPLVTIiSHYETPLHIiA+ YNGWA+R IjIAF+E++A+TV RYKDKVKSfWLTENEVNS+L . 
Sbjct: 126 lEPLOTLSHYETPLHLftRKYNGVffiNRELIAFYERYARTVFTRYKDKVKYWLTFl^ 185 

Query: 177 HMPFTSGAIMTDKSQLSPQELYQAIHHELVASARVTKLGRSINPNFKIGCMILAMPAYPM 236 

H PF SG I+TD QLS Q+LYQA+HHELV SA TK+G INP+FKIGCM+LAMPAYPM 
Sbjct: 186 HAPFMSGGIITDPEQLSKQDLYQAVHHELWSALATKTOHEINPDFKIGCMVLAMPAYPM 245 

Query: 237 TSDPRDVLAARQFEQHinJLFSDIHWGKyPTYIQSYFKNNGIKIKFEEGDEEVLAQNTVD 296 

T+DP D LA R+FE N LFSD+H RGKYP YI+ YFK+N I IK EGD+E++ +NTVD 
Sbjct: 246 TADPI^QLAVREFENQireLFSDLHARGKYPNYIKRYFKDNNIDlKMGEGDKEI^ 305 

Query: 297 FLSFSYYMSVTQAYDFENYQSGQGNILGGLTNPHLTTSEWGWQIDPIGLRLVENQYYERY 356 

F+SFSYYMSV A++ E+Y SG+GN+LGGL+NP+L SEWGWQIDP+GLRLVLN Y+RY 
Sbjct: 306 FISFSYYMSVAAAHNPEDYNSGRGNVLGGt.SNPYLQASEWGWQIDPVGLRLVLNDSYDRY 365 

Query: 357 QIPLFIVKNGLGAKDQLIETLDGDYTVEDDYRIDYMNQHLVQVAKAIEDGVEIMGYTSWG 416 

Q+PLFIVENGLGAKD L++ DG TVEDDYRIDY+ +HL+QV +A++DGV+++GYT+WG 
Sbjct: 366 QLPLFIVENGI<3AKDVLVQGPDGP-TVEDDYRIDYI<2KHLMQVGEALQDGVDLLGYTTWG 424 

Query: 417 CIDCVSMSTAQLSKRYGLIYVDRNDDGTGSLQRYKKKSFGWYQKVIKTNGQSLFE 471 

ID VS ST +LSKRYG lYV NDDG+GSL RYKKKSF WY+KVI+TNG SL+E 
Sbjct: 425 PIDLVSESTVELSKRYGFIYVACNDDGSGSLaRYKKKSFAWYKKVIETNGASLYE 479 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5287> which encodes fhie amino acid 
sequence <SEQ ID 5288>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

»> Seems to have no N-terminal signal sequence 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 390/469 (83%) , Positives = 423/469 (90%) 

Query: 1 ^m^PKHFLWGGAVAANQVEGAFRTDGKGLSVQlmlENGGLGDFTAKPTPDNLKLEAIDF 60 

M +FPK FLWGGAYARNQVEGAP D KGLSVQDVLPNGGLG++T PT DHL LEAIDF 
Sbjct: 1 MGIFPKDFLWGGAVRRNQVEGAFEADAKGLSVQDVLENGGLGEWTDSPTSDNLTLEAIDF 60 

Query: 61 YHNYKNDIKLFAEMGFKVFRTSIAWSRIFPNGDDSAPNEAGLQFYDNLFDELLKYNIEPL 120 

YH YK DI LFAEMGFKVFRTSIAWSRIFPNGDD PNEAGLQFYD+LFDELL Y lEPL 
Sbjct: 61 YHRYKEDIALFAEMGFKVFRTSIAWSRIFPNGDDDQPNEAGLQFYDDLFDELLNYGIEPL 120 



Final Results 



bacterial cytoplasm Certainty=0 . 0763 (Affirmative) < euco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty^^O. 0000 (Not Clear) < suco 



Query: 121 VTLSHYETPLHLAKTYNGWADRRLIAFFEKFAQTVMERYKDKVKYWLTENE™ 180 

VTLSHYETPLHIAK YNGW DRRLI FFE+FAQTVMERYKDKVKYWLTENEVNSILHMPF 
Sbjct: 121 VTLSHYETPLHLAKAYNGWTDRRLIGFFERFAQTVlffiRYKDKVKYWLTFNEVNSII.HMPF 180 
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Query: 181 TSGAIMTDKSQLSPQELyQAIHHELVaSARVTKLGRSINENFKIGCMILAMPAYPMTSDP 240 

TSG IMT+K +LS Q+LYQAIHHELVASA VTKL INP+ K+GCMILAMPAYPMTSDP 
Sbjct: 181 TSGGIMTEKEKLSLQDLYQAIHHELVASASVTKLAHEINPDVtCVGCMIIiAMPAYPMTSDP 240 



Query: 241 RDVLAARQFEQHjaLFSDIHTOGKYPTYIQSYFKIfflGIKIKFEEGDEEVLAQNTVDFLSF 300 

RD+LRA FE NLLFSDIHVRGKYP+YI+SYPK NGI+I PE+GD+E+LA++TVDFLSF 
Sbjct: 241 RDILAaHAFENIl^mLFSDIHTOGKyPSYIKSYFKENGIEIVFEDGDKELIlAEHTVDFLSF 300 



Query: 301 SYYMSVTQAYDFENYQSGQGNILGGLTNPHLTTSEWGWQIDPIGLRLVLNQYYERYQIPL 360 

SYYMSVTQA++ E Y SGQGNILGGL+NP+L +SEWGWQIDPIGLRLVLNQYY+RYQIPL 
Sbjct: 301 SYYMSVTQaHNPEAYTSGQGMILGGLSNPYLESSEWGWQIDPIGLRLVIiNQYYDRYQIPIi 360 

Query: 361 FIVENGLGAKDQLIETI^GnXTVEDDYRIDYMNQHLVQVaKAIEDGVEIMGYTSWGCIDC 420 

FIVEaSIGIiGAKDQL++T DG TV DDYRIDYM+QHLVQVAKAIEDGVE+MGYTSWGCIDC 
Sbjct: 361 FIVENGLGAKDQLVQTADGS^r^VHDDYRIDYMSQHLVQVAKAIEDGVEVMGYTSWGCIDC 420 

' Query: 421 VSMSTAQLSKRYGLIYVDRNDDGTGSLQRYKKKSFGWYQKVIKTNGQSL 469 
VSMSTAQLSKRYG lYVDRNDDGTG L RYKKKSF WY++VI+TNG+ L 
Sbjct: 421 VSMSTAQLSKRYGFIYVDRMDDGTGQLTRYKKKSFDWYRQVIQTNGRYL 469 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2163 

A DNA sequence (GBSx2280) was identified in S.agalactiae <SEQ ID 6685> which encodes the amino 
acid sequence <SEQ ID 6686>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>>> Seems to have no N-terminal signal sequence 
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Pinal Results 

bacterial membrane Certainty=0 . 5161 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certaintyi=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>QP:CftA84286 GB:Z34526 beta-glucoside permease [Bacillus subtilis] 
Identities = 225/594 (37%) , Positives = 351/594 (58%) , Gaps = 11/594 (1%) 

. Query: 4 YQETAKAIIJUIVGGEKNIQHVTHCVTRLRLVLDNDEIVNDQVIKTIPNVIGVMRKNDQYQ 63 
Y + +K IL VGGE+N+Q V HC+TRLR L ++ + ++ +P V+G +Q+Q 
Sbjct: 3 YDKLSKDILQLVGGEENVQRVIHCMTRLRFNLHDNAKADRSQLEQLPGVMGTNISGEQFQ 62 

Query: 64 IILGNDVKNYYliiaFLALGHFENTTREFSSQKKSSILEKLIETIAGVirPLIPALLGGGML 123 

II+GNDV Y A + + + SS +K ++L + + I+GV TP++PA+ G GM+ 

Sbjct: 63 IIIGNDVPKOTQAIVRHSNLSDBKSAGSSSQKKNVLSAVFDVISGVFTPILPAIAGAGMI 122 

Query: 124 KVIGILLPMLGIASSSSQTVAFINFFGDAAYYPMPIMIAYSRASRFKVTPVLAATVGGIL 183 , 

K + L G + SQ + GD A+YF+P+++A SAA +F P +AA + + 
Sbjct: 123 KGLVALAVTFGWMAEKSQVHVILTAVGDGAFYPLPLLLAMSAARKPGSNPYVAAAIAAAI 182 

Query: 184 LHPAFVTMVAEGKPLSLFGAPVTLASYGSSVIPILIMVFLMQYIERWINKIVPSVMKSFL 243 

LHP ++ GKP+S G PVT A+Y S+VIPIL+ +++ Y+E+WI++ + +K + 
Sbjct: 183 LHPDLTALLGAGKPISFIGLFVTAATYSSTVIPILLSIWIASYVEKWIDRFTHASLKLIV 242 



Query: 244 QPTLIIL1SGFLALWVGPLGVIIGK6LSSAMLSIYHVAPWLALSILGAIMPLWMTGMH 303 
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PT +LI L L+ VGPLG I+G+ LSS + ++. A +A+ +l' ' L++MTGMH 
Sbjct: 243 VPTFTLLIWPLTLITVGPLGAILGEYLSSGVNYLFDHAGLVaMILLAGTFSLIIMTGMH 302 

Query: 304 WAFAPIFLAASVATPDVLILPAMriASNIiAQGAASLAVAVKAKQKQTRQVAPAAGLSALIiA 363 
5 +AF PI + +LPAM +N+ Q AS AV ++++ K+ + +A ++AL+ 

Sbjct: 303 YAFVPIMINNIAQNGHDYLLPAMETjAimGQAGASFAVFIiRSimKKFKSL^ 361 

Query: 364 GITEPALYGVTLKFKKPLYAAMISGGLVGAYIGLVNIASYTFWPSIIGLPQYINPQGGN 423 

GITEPA+YGV ++ KKP AA+I G GA+ G+ +ASY +V GLP I G 
10 Sbjct: 362 GITEPAMYGVNMRLKKPFAAALIGGAAGGAFYGMTGVASY- -IVGGNAGLPS-IPVFIGP 418 

Query: 424 NPSNAVIAAIATIILTFIITWFLGIDEGENEKSSINAQEHTHIRSGLSKKETLYSPMVGN 483 

F A+I + + LG ++ ++ S Q H S +E ++SP+ G 
Sbjct: 419 TPIYBMIGLVIAFAaETAAAYLLGFEDVPSDGSQ---QPAVHEGS REIIHSPIKGE 471 

15 

Query: 484 VLPLSKVPDETFSSKLLGEGLAITPSVGEVYAPFDGEIISLFPTKHAIALKDDKGVEVLI 543 

V LS+V D FS+ ++G+G AI P GEV +P G + ++F TKHAI + D+G E+LI 
Sbjct: 472 VKALSEVKDGVFSAGVMGKGFAlEPEEGEWSPVRGSVTTIFRTKHAIGITSDQGZffilLI 531 

20 Query: 544 HIGIDTVEUJGESFEQLVKVGDPVKRGQLLLRMDIDPISSKGYSLISPVVVTNS 597 

HIG+DTV+L G+ F +K GD V G L+ D++ I + GY +I+PV+VTN+ 
Sbjct: 532 HIGLDTVKLEGQWFTAHIKEGDKVAPGDPLVSFDLEQIKAAGYDVITPVIVTNT 585 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2883> which encodes the amino acid 
25 sequence <SEQ ID 2884>. Analysis of tiiis protein sequence reveals the following: 

Possible site: 20 

>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane — Certainty^O. 5161 (Affirmative) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 508/619 (82%), Positives = 561/619 (90%), Gaps = 1/619 (0%) 

Query: 4 YQETAKAIIAAVGGEKNIQHVTHCVTRLRLVLDNDEIVNDQVIKTXPNVIGWIRKNDQYQ 63 
45 YQETAKAILaAVGG+ NIQ VTHCVTRLRLVL NDE V DQ +K I NVIGVMRKN QYQ 

Sbjct: 3 yQETAKAIIAAVGGKTNIQRVTHCmM:JRLVLKm3EKOT<I)QQVKAISWIGV^mKNG^ 62 

Query: 64 IILGNDVNNYYlilAFLALGHFENTTREFSSQKKSSILEKLIETIAGVITPLIPALLGGGML 123 
IILGNDVNNYY AFL+LGHF+N + SS+ K SILE+LIETIAGVITPLIPALLGGGJIL 
50 Sbjct: 63 IILGNDVHNYYQAPLSLGHFDNQDEDHSSKAKBSILERLIETIAGVITPLIPALLGGGML 122 

Query: 124 KVIGILLPMLGIASSSSQTVAFINFFGDAAYYFMPIMIAYSAASRFKVTPVLAATVGGIL 183 

KV+GILLPMLG+AS+ SQTVAFINFFGDAAYYFMP+MIAYSAA+RFKVTPVLflAT+ GIL 
Sbjct: 123 KS7VGILLP^ttJGLASADSQTVAPINFPGDAAYYFMPVMIAYSAAftRFKVTPVLaATIAGIL 182 

55 

Query: 184 LHPAFVTMVREGKPLSLFGAPVTLRSYGSSVIPILIMVFLMQYIERWINKIVPSVMKSFL 243 

LHPAFV MVAEGKPL+LFGAPVT ASYGSSVIPIL+MV+LMQYIE+W+N++VPSVMKSFL 
Sbjct: 183 LHPAFVRMVREGKPLTLFGAPOTPASYGSSVIPILMMVYLMQYIERWVNRLVPSVMKSFL 242 

60 Query: 244 QPTLIILISGFriALVWGPLGVIIGKGLSSflMLSIYHVAPWLALSILGAIMPLWMTGMH 303 

QPTLIILISGFLALVWGPLGVIIG+GLS+ ML+IYHVAPWLAL+ILGAIMPLWMTGMH 
Sbjct: 243 QPTLIILISGFLALVWGPLGVIIGQGLSNTMLAIYHVAPWLALAILGAIMPLVVMTGMH 302 

Query: 304 WAFAPIFLAASVATPDVLILPAMLASNIAQGAASLAVAVKAKQRQTRQVAFAftGLSALLA 363 
65 WAFAPIFIAASVATPDVLILPANILASNLAQGAASLAVA K KQKQTRQVA AAG+SALLA 
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Sb j ct : 


303 


WAFAPIFLi^SVATPDVLILPAMIiASNLAQGAASIAVAFKTKQKQTRQVALAAGISALLA 


362 


Query: 


364 


GITEPALYGVTLKFKKPLYAAMISGGLVGAYIGLVNIASYTFWPSIIGLPQYINPQGGN 


423 






GITEPALYGVTLKFKKPLYAAMISGGLVGa+IG VNIASYTFWPSIIGLPQYIMP GG 




Sb j ct : 


363 


GITEPALYGVTLKFKKPLYAAMISGGLVGAFIGFVNIASYTFWPSIIGLPQYINPSGGA 


422 


Query: 


424 


NFSNAVIAAIATIILTFIITWFLGIDEGENEKSSINAQEHTHIRSGLSKKETLYSPMVGN 


483 






NF+NA+IA ATI+L F +TWF+GIDE E+ K A + + ++SGLS K+TLY+PM G 




Sb j ct : 


423 


NFTtlALIAGTATIVLAFSLTWFMGIDE-ESPEQVSVAADMSQVKSGLSTKQTLYAPMTGE 


481 


Query: 


484 


VLPLSKVPDETFSSKLLGEGLAITPSVGEVYAPFDGEIISLFPTKHAIALKDDKGVEVLI 


543 






+L LS+VPDETFSSKLLGEG AI PS GEVYAPFDGE+I+ FPTKHA+ALK+ +GVEVI.I 




Sb j ct : 


482 


MDFLSEVPDETFSSKIiLGEGFAILPSEGEVYAPFDGEVITFFPTKHAVALKNTRGVEVLI 


541 


Query: 


544 


HIGIDTVEtNGEGFEQLVKTODFVKRGQLIJLR^roIDFISSKGYSLISPVVVTNSIDQLEI 


603 






H+GIDTVEL G+GFEQLV VGD VKRGQ LL+MDrDFI+SKGYSLISPWVTNS +QIjEI 




Sbj ct : 


542 


HVGIDTVELKGQGFEQLVSVGDWKRGQALLKMDIDFITSKGYSLISPVWTNSAEQLEI 


601 


Query: 


604 


IVKDAETMTOJEDDLLVIL 622 








I++D + MVT ED LLVIL 




Sbj ct : 


602 


IIQDDKKMVTKEDALLVIL 620 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2164 

A DNA sequence (GBSx2281) was identified in S.agalactiae <SEQ ID 6687> which encodes the amino ■ 
acid sequence <SEQ ID 6688>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

»> Seems to have no N-teinninal signal sequence 

Final Results 

bacterial cytoplasm Certainty^O . 1148 (Affirmative) < suco 

bacterial raetribrane Certaintys=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15944 GB:Z99124 transcriptional antiterminator (BglG family) 

[Bacillus subtilis] 
Identities = 118/275 (42%), Positives = 183/275 (S5%) 



Query: 


1 


MIIKRVLNHNAVISVTHQGLDVLLMGKGIAFKKRIGDRINSDAIEKSFVLKNSDNMNRFT 


60 






M I +V+N+N + V QG ++++MG+G+AF+K+ GD ++ lEK F L N D +F 




Sbj ct : 


1 


MKIAKVINNNVISVVNEQGKELVVMGRGIJ^QKKSGDDVDEARIEKVFTL^ 


60 


Query: 


61 


ELFITVPEEVVACSERIIin^KIKIiGKNIDEILYINLTDHIHSAlERHEQGMVIQNPI^ 


120 






L +P E + SE 11+ K++LGK L++ +y++LTDHI+ AI+R+++G+ I+N L 




Sbjct: 


61 


TI■LYDIPIEC^ffiVSEEIIHyAKI<2LGKKIM)SIYVSLTDHINPAIQRNQKGIlDIK]^ 


120 


Query: 


121 


EIQRYYPDEYSIGMKALELIKDELGICLTIDESAFIAMHFVNAGLDNPFNEAHKITEIVS 


180 






E +R Y DE++IG +AL ++K++ G+ L DE+ FIA+H VNA L+ IT+++ 




Sbjct: 


121 


ETKRLYKEEFAIGKEALVMVKNKTGVSLPEDEAGFDiLHIVNAELtffiEMENIIOT 


180 


Query: 


181 


YIEQK\nKIDFRTELDESSIDYYRF^OHTKI.FAQRVLSGMKyEDDIffiDIiLVVKKKyPREy 


240 






I VK F+ E +E S+ YYRF+TH K FAQR+ +G E D LL VK+KY R Y 




Sbjct: 


181 


EILSIVKYHFKIEFNEESLHYYRFVTHLKFFAQRLFWGTHMESQDDFLIiDTVKEKYHRAY 


240 


Query: 


241 


KCVKEIGNNMAIQYQYQIJJSSELLYLTVHVKRLVK 275 








+C K+I + +Y+++L S EI)LYLT+H++R+VK 




Sbjct: 


241 


ECTKKIQTYIEREYEHKLTSDELLYLTIHIERWK 275 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 6689> which encodes the amino acid 
sequence <SEQ ID 6690>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have no N- terminal signal sequence 



Pinal Results 

bacterial cytoplasm Certainty=0. 0680 (Affirmative) < suco 

bacterial membrane — CertaintyteO.OOOO (Not Clear) < suco 

bacterial outside — Certainty4=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 220/279 (78%), Positives = 246/279 (87%) 

Query: 1 MIIiOlVIJSHNaVISVTHQGLDVIIMGRGIAFKKRIGDRINSIjaiEKSFVLK^^ 60 
15 M+IKRVnNHNA. IS HQGLD+LLMGKGI F K++GD I +AIE SFVLKNSDNMNRFT 

Sbjct: 1 MLIKRVtNHNaAISTNHQGLDILLMGKGITFGKKVGDSIELNAIETSFVLKNSDNPIN^ 60 

Query: 61 ELFITVPEEWACSERIINLGKIKLGKNLDEILYINLTDHIHSAIERHEQGMVIQNPLRL 120 
ELFITVP+EWACSERIINLGKIKLGK LDEILYINLTDHIHSAIERHEQGM+I NPLR 
20 Sbjct: 61 ELFITVPQEWACSERIINLGKIKLGKTLDEILYINLTDHIHSAIERHEQGMLIHNPLRW 120 

Query: 121 EIQRYYPDEYSIGMKALELIKDELGICLTIDESAFIAMHFVNaGLDNPENEAHKITEIVS 180 

EIQRYYPDEYS+G+KALEI1I+ LG+ L IDE+AFIAMHFVNA LD PF E H++TEIVS 
Sbjct: 121 EIQRYYPDEYSLGVKRLELIERIttiGVTLAIDEaAPIAMHFVNASLDTPFKEPHRLTEIVS 180 

25 

Query: 181 YIEQKVKIDFRTELDESSIDYYRFMTHTKLFAQRVLSGMKYEDDDADLLLWKKKYPREY 240 

YIEQK+K DF+TEI.D++SIDYYRFMTH KLFAQRVLS M Y+DDDA+LLLWK KYP+EY 
Sbjct: 181 YIEQKIKTDFKTEIiDDTSirJYYRFbrailKLFAQRVLSQMSYIJDDDftELLLV^n^ 240 

30 Query: 241 KCVKEIGNlin«IAIQyQYQIJJSSELLYLTVHVKRLVKNLKE 279 

+CV +1 + +Y Y LNSSELLYLTVHVKRLVK+LKE 
Sbjct: 241 RCVLDISEEIKKRYNYHIiNSSELLYLTVHVKRLVKHLKE 279 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 2165 

A DNA sequence (GBSx2282) was identified in S.agalactiae <SEQ ID 669 1> which encodes the amino 
acid sequence <SEQ ID 6692>. Analysis of this protein sequence reveals the following: 

Possible site: 16 
40 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1104 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

45 bacterial outside Certaintyi=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9335> which encodes amino acid sequence <SEQ ID 9336> 
w;as also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6693> which encodes liie amino acid 
50 sequence <SEQ ID 6694>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

»> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 3314 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 143/178 (80%) , Positives = 161/178 (90%) 

Queiy: 1 MTIiHHDKHimTYVMmNaMiEKHPEIGEDLEMIiaDVSQIPEDIRQaVIimGGGI^^ 60 

MTLHHDKHHATYVAN NAALEKHPEIGE+LE LLADV+H-IPEDIRQ H-INNGGGHUSIHAL 
Sbjct: 24 MTIMDIOlHaTWJWimALEKHPEIGENLEELliADVTKIPEDIRQTLINNGGGHLMH^ 83 

Query: 61 FWELMSPEETQISQELSEDINaTFGSFEDFKMFTAAATGRFGSGWAWLVVNJiEGKLEVL 120 

FWEL+SPE+ ++ ++++ 1+ FGSF+ FK FTARATGRFGSGWRWLWN EG+LE+ 
Sbjct: 84 FWELLSPEKQDVTPDVAQAIDDAFGSFDAFKEQFTAaATGRPGSGWAWLVVNKEGQLEIT 143 

Query: 121 STANQDTPIMEGKKPILGLDVWEHAYYLIWRNTOPNYIKAFFEIINWNKOTJELYQAAK 178 

STANQDTPI EGKKPIL LDVWEHAYYIiNYRNVRPNYIKAFFEI+NW KV+ELYQAAK 
Sbjct: 144 STANQDTPISEGKKPILALimffiHAYYIiNYiaiVRENYIKAFPEIV™ 201 

Based on this analysis, it was predicted that these proteias and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2166 

A DNA sequence (GBSx2283) was identified in S.agalactiae <SEQ ID 6695> which encodes the amino 
acid sequence <SEQ ID 6696>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3331 (Affirmative) < suco 

bacterial membrane Certainty!=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

Example 2167 

A DNA sequence (GBSx2284) was identified in S.agalactiae <SEQ ID 6697> which encodes the amino 
acid sequence <SEQ ID 6698>. This protein is predicted to be DNA polymerase III delta subunit. Analysis 
of this protein sequence reveals the following: 

Possible site: 32 

>» Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0 . 0511 (Affirmative) < suco 

bacterial menibrane Certaintys=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9743> which encodes amino acid sequence <SEQ ID 9744> 
was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6699> which encodes the amino acid 
sequence <SEQ ID 6700>. Analysis of this protein sequence reveals the following: 

Possible site: 43 
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>>> Seems to have no N-terminal signal sequence . 

INTEGRAL Likfelihood = -1.22 Transmembrane 250 - 266 '( 249 - 266) 



Final Results 

5 ' bacterial membrane — Certaintyi=0. 1489 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

10 Identities = 222/340 (65%) , Positives = 282/340 (82%) 

Query: 1 MIAIEEIGRITPim/3LVTVIAGEDW3QYAQMKEKLFQVIGENKDDIiAySYPDLSEEDYQ 60 

MIAIE+I +++ +NLGL+T++ G+D+GQY+Q+K +L + I F+KDDLAYSYFD+SE YQ 
Sbjct: 1 MIAIEKIEKLSKENLGLITLVTGDDIGQYSQLKSRLMEQIAFDKDDLAYSYFDMSEAAYQ 60 

15 

Query: 61 NAELDLESLPPLSDYKWIFDQFQDITTDKKTYLDEQAMKRFEAYLQNFVDTTRLVICAP 120 

+AE+DL SLPF ++ KWIFD DITT+KK++I1 E+ +K FEAYL+NP++TTRL+I AP 
Sbjct: 61 DAEMDLVSLPFFAEQKWIFDHLLDITTNKKSFLKEKDLKAFEAYLENPLETTRLIIFAP 120 

20 Query: 121 GKLDGKRRLVKLLKRDARVLEANTLKESDLKTYFQKYAHQEGLVFEAGVFDELLIKSNYD 180 

GKLD KRRLVKLLKRDA VLEAN LKE++L+TYFQKY+HQ GL FE+G FD+LL+KSN D 
Sbjct: 121 GKLDSKRRLVKLLKRDALVLEANPLKEAELRTYFQKYSHQLGrfiPESG3iFDQriLLKS]Sro 180 

Query: 181 FSDTLTNIAFLKSYRTDGHISSNDVREAIPKSLQIMJIPDLTQDVLICRIDLARDLVR^ 240 
25 FS + N+AFLK+YK G+IS D+ +AIPKSLQDNIFDLT+ VL G+ID ARDL+ DIiR 

Sbjct: 181 FSQIMKNMAFIiKAYKKTGNISLTDIEQAIPKSLQDNIFDLTRLVLGGKIDAARDLIHDLR 240 

Query: 241 LQGEDEIKLIAIMLGQFRMFLQVKILASKGKSESQIVSELSHYIGRKINPYQVKFAVRDS 300 

L GED+IKLIAIMLGQFR+FLQ+ ILA K+E Q+V LS +GR++NPYQVK+A++DS 
30 Sbjct: 241 LSGEDDIKLIAIMLGQFRLFLQLTIIARDVKNEQQLVISLSDILGRRVNPYQVKYALKDS 300 

Query: 301 RNLPLAFLKEAIRILIETDYAIKRGTYDKDYLPDLALLKI 340 

R L LAFL A++ LIETDY IK G Y+K YL D+ALLKI 
Sbjct: 301 RTLSLAFLTGAVKTLIETDYQIKTGLYEKSYIiVDIALLKI 340 

35 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2168 

A DNA sequence (GBSx2285) was identified in S.agalactiae <SEQ ID 6701> which encodes the amino 
40 acid sequence <SEQ ID 6702>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0.3071{Affi2nnative) < suco 

bacterial membrane Certainty4=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

50. Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2169 

A DNA sequence (GBSx2286) was identified in S.agalactiae <SEQ ID 6703> which encodes flie amino 
acid sequence <SEQ ID 6704>. This protein is predicted to be esterase. Analysis of this protein sequence 
55 reveals the following: 
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Possible site: 26 

»> Seems to, have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -0.32 Transmembrane 175 - 191 ( 175 - 191) 



5 Final Results 

bacterial membrane — Certainty=0.1128(AEfirmative) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

1 0 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AftB17013 GB:L38252 esterase [Acinetdbacter Iwoffii] 
Identities = 63/218 (28%) , Positives = 107/218 (48%) , Gaps = 3/218 (1%) 

2uery: 105 KVIFYVHGGSYIHQRSELQYIFVNKLAKKLDAKVVFPiyPKAPTYNYSDAIPKIKKLYQN 164 
15 ++IF++HGG++ + + LA + +V+ yp AP + Y +AI I +YQ 

QLIFHIHGGAFFLGSLNTHRALMTDLAARTQMQVIHVDYPLAPEHPYPKAIDAIFDVYQA 132 



Query: 


105 


Sb j ct : 


73 


Query: 


165 


Sbjct: 


133 


Query: 


225 


Sb j ct : 


193 


Query: 


285 


Sbjct: 


252 



PK 11+ G+S G LAL L L + P +IL+SP+i:iD+ 



QK D LIj LQ ++ +P+VSPL+ + + P +G+ +1 D++ 



+K + ++K H+ + M H + + PEA+ A 



30 There is also homology to SEQ ID 3498. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2170 

A DNA sequence (GBSx2287) was identified in S.agalactiae <SEQ ID 6705> which encodes the amino 
35 acid sequence <SEQ ID 6706>. This protein is predicted to be purine nucleotide synthesis repressor. 
Analysis of this protein sequence reveals the following: 

Possible site: 51 

»> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 . 2970 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CftB16124 GB:Z99124 similar to transcriptional regulator (Lad 
family) [Bacillus si^btilis] 
Identities = 111/300 (37%) , Positives = 175/300 (58%) , Gaps = 4/300 (1%) 

50 Query: 1 MTSISDIAKKAGVAKSTVSRVIlJHHPHVSDETRQKVMaLITELDYIPNQLAEDLSRGKTQ 60 

M +1 +IA+ A V+ STVSRV+NHHP+VS+E R+ V ++ ELDY PN+ A DL RGKT 
Sbjct: 1 MANIKEIARLANVSVSTVSRVLNHHPYVSEEKRKLVHQVMKELDYTPNRTAIDLIRGKTH 60 

Query: 61 KIGWIPHTRHPYFTQLINGLLDAAKTTDYQLVMMPSDYNQELELSYLKQLKMEAIDALI 120 
55 +GV++P++ HP F +++NG+ AA +Y ++P++YN ++E+ YL+ L+ + ID LI 

Sbjct: 61 TVGVILPYSDHPCFDKIVNGITKAAFQHEYATTLLPTNYNPDIEIKYLELLRTKKID6LI 120 



Query: 121 FTSRAISLDIIETYAKYGRIWCEKLQEYNHLSSAYLDRYSSFLEAFSDMKLRGLEHLVL 180 
TSRA D I Y +YG ++ CE + + + A+ DR +++ E+F +K RG E++ 
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Sbjct: 121 ITSRANHWDSILAYQEYGPVIACEDTGDID-VPCAFNDRKTAYAESFRYLiKSRGHENIAF 179 

Query: 181 LFSRISI]SrESSATYQSALLAYQEVYGQLSSPY^OTG^M^DFNDG-Ilm:JSYQLVK^ 239 

R + S + AY+ V G+L +m G +D HDG L + + I 

Sbjct: 180 TCTOEADRSPSTADKAAAYKAVCGRLEDRHMLSG-OroMNDGEIAAEHFYMSGRVra^ 238 

Query: 240 ATSDEVAAGLIKGYEESRKKCPYIIGQECLLVGQLLKLPTIDHKSYYLGKIiAFKQALAEK 299 

A SDEVAAG I + + IIG+ + ++L P++D LG AF L ++ 

Sbjct: 239 ANSDOTAftG-IHLFAKKMNWDVEIIGEGtWSISRVLGFPSLDLSm^^ 297 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2171 

A DNA sequence (GBSx2288) was identified in S.agalactiae <SEQ ID 6707> which encodes the amino 
acid sequence <SEQ ID 6708>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3451 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=o. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC21682 GB:U32686 conserved hypothetical protein [Haemophilus influenzae Rd] 
Identities = 79/264 (29%) , Positives = 134/264 (49%) , Gaps = 16/264 (6%) 



Query: 


1 


MTIKRIFCDMDGTLENSEGQVSKSNATLIREAA- - - IPVTLVSARAPMEMKDAVDALQLG 


57 






M K +F D +GTLL S+ +S +1+ IP +SAR+P+ + L+ 




Sb j ct : 


1 


MMYKAVFSDFNGTLLTSQHTISPRTVWIKRLTANGIPFVPISARSPLGILPYWKQLETN 


60 


Query: 


58 


GVQVAENGGLIYRIGDNNQVLPIHTQIIKKSTVKQLLRGIRFHFPQVSLSYYDLNNWYCD 


117 






V VAF+G LI N + PI++ 1+ + ++ + H P + ++YY N+ + 




Sb j ct : 


61 


NVLVAFSGALIL NQNLEPIYSVQIEPKDILEINTW^H-PLLGVNYYTNNDCHAR 


115 


Query: 


118 


KID-EGIRYEHSLTQQCPTFIHlfflDQFIiEGHTNTFKINIMITFDEfiNMLELEKYLQSLELP 


176 






++ + + YE S+T+ IH D+ T + + I + ++E+E L+ + P 




Sb j ct : 


116 


DVENKWVIYERSVTK- - - lEIHPFDEVA- - -TRSPHKIQIIGEAEEIIEIEVLLKE-KFP 


168 


Query: 


177 


EITIQRSGKAYLEITHLLAKKSKGIAYILQKEQLRREETAAFGDGHNDLPMLEMVGYPIV 


236 






++I RS +IiE+ H A K + ++ + E AFGD NDL MLE VG + 




Sbj ct: 


169 


HLSICRSHftNFIjEVMHKSATKjGSAVRFLEDYFGVQTNEVIAFGnNFNDLDMLEHVG 


228 


Query: 


237 


MDNAPDDIKAIAYQLTKSNDEDGV 260 








M NA ++IK A +T +N+EDG+ 




Sbj ct : 


229 


MGNAPNEIKQAANWTATNNEDGL 252 





Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2172 

A DNA sequence (GBSx2289) was identified in S.agalactiae <SEQ ID 6709> which encodes the amino 
acid sequence <SEQ ID 671 0>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

»> Seems to have no N-terminal signal sequence 



Final Results ' 

bacterial cytoplasm Certainty=0 . 2854 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protem and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

Example 2173 

A DNA sequence (GBSx2290) was identified in S.agalactiae <SEQ ID 671 1> which encodes the amino 
acid sequence <SEQ DD 6712>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

>>> Seems to have an uncleavable N-terro signal sag 



INTEGRAL 


Likelihood 




■10. 


.51 


Transmembrane 


392 




408 


( 


376 




417) 


INTEGRAL 


Likelihood 




-9. 


.92 


Transmembrane 


440 




455 


( 


433 




461) 


INTEGRAL 


Likelihood 




-6. 


.42 


Transmembrane 


52 




68 


( 


51 




70) 


INTEGRAL 


Likelihood 




-6. 


,32 


Transmembrane 


29 




45 


( 


9 




48) 


INTEGRAL 


Likelihood 




-6, 


.32 


Transmembrane 


309 




325 


( 


308 




328) 


INTEGRAL 


Likelihood 




-4, 


.46 


Transmembrane 


12 




28 


( 


9 




29) 


INTEGRAL 


Likelihood 




-3. 


.29 


Transmembrane 


463 




479 


( 


462 




479) 


INTEGRAL 


Likelihood 




-2, 


.07 


Transmembrane 


353 




369 


( 


352 




369) 


INTEGRAL 


Likelihood 




-1, 


.17 


Transmembrane 


374 




390 


( 


374 




390) 


INTEGRAL 


Likelihood 




-0, 


.85 


Transmembrane 


247 




263 


( 


247 




263) 


INTEGRAL 


Likelihood 




-0, 


.06 


Transmembrane 


278 




294 


( 


278 




294) 



Final Results 

bacterial membrane — Certainty=0. 5203 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certaintyi=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC23742 GB:AF052208 conpetence protein [Streptococcus pneumoniae] 
Identities = 325/705 (46%) , Positives = 478/705 (67%) , Gaps = 3/705 (0%) 



Query: 


1 


MLQLTKYFPLKPIYLALLVFQIYLLVFSWTMLGCAFLLFSFIFLIYQYDRETIFKTIAIV 


60 






MLQ K F + IYL+ L+ +Y +FS + L +F + L Q+ ++ K + I 




Sbjct: 


1 


MLQWIKNFSIPLIYLSFLLLWLYYAIFSASYLALLGFVFLLVCLFIQFPWKSAfiKVLIIC 


60 


Query: 


61 


IFFLFYFLWQ»raNMNVQYQRVPNHISQIKVRIDTISINGDVLSFQADASGI!OT 


120 






F F+F++QN + Q + + + ++++ DT+ +NGD LSF+ AG +Q +Y L+ 




Sbjct: 


61 


GIFGFWFVFQNWQQSQASQNLADSVERVRILPDTVKWSrGDSLSFRGKADGRIFQVYYKLQ 


120 


Query: 


121 


NKSEKDYFQNLDNNIMIIADIKLEEAEERRHFNGFDYRQYLKRHGIYRIAKVTKIKQIRL 


180 






++ EK+ FQL+ I + KLEE +R+F GF+Y+ YLK GIY+ + KI+ ++ 




Sb j ct : 


121 


SEEEKE2\FQALTDLHEIGLEGKLSEPEGQRNFGGFNYQAYLKTQGIYQTLNIKKIQSLQK 


180 


Query: 


181 


FQHRSFFALMSKWRRSAIVISQT-FENPMRHYMSGLLFGYLDKTFDDMSDLYSSLGIIHL 


239 






+S RR A+V +T FP+PMR+YM+GLL G+LD F++M++LySSLGIIHL 




Sbj ct: 


181 


IGSWDI6ENLSSLRRKAVVWIKTHFPDP^^ENY^TO3LLI,GHLDTDFEEMNELYSSLGIIHL 


240 


Query: 


240 


FALSGMQVGFFLGIFRYICLRIGLRLDHVWLLQIPPSLIYAGLTGFSISWRALIQSLLS 


299 






FALSGMQVGFF+ F+ + LR+GL + + L PFSLIYAGLTGFS SV+R+L+Q LL+ 




Sbj ct : 


241 


FALSGMQVGFFMNGFKKLLLRLGLTQEKLKWLTYPFSLIYAGLTGFSASVIRSLLQKLLA 


300 


Query: 


300 


HSGVKKDENFALCLLICLISLPHSLLTTGGVLSFAYAFILTMTSFDHFSSIKKTAIESLT 


359 






GVK +N AL +L+ I +P+ T GGVLS AYAFILTM S + +K VA ESL 




Sbjct: 


301 


QHGVKGLDNCALTVLVLPIVMPNFFFTAGGVLSCAYAFILTMPSKEG-EGLKAVASESLV 


359 


Query: 


360 


VSVGILPILTYYFSGFQPISIILTALLSFAFDIIFLPLLTVIFVLSPIVKLSCINSLFEI 


419 



+S+GILPIL++YF+ FQP SI+LT + SF FD+ FLPLL+++FVLS + + +N -l-FE 
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Sbjct: 360 ISLGILPILSFYFAEFQPWSILLTFVFSFLFDLTFLPLLSILFVLSFLYPVIQLNFIFEW 419 

Query: 420 LEVLLKWTGQLFPRPLIFGKPSLFLLIVMIIILGLLYDYYHSKCFRYCSLLIIFTLFFIT 479 

LE +++ Q+ RPL+FG+P+ +LLI+++I L L+YD + L+I LF +T 

Sbjct: 420 LEGIIRLVSQWSRPLWGQPNTWLLILLIiISLftLVYDLRKNIKKLTVLCLLITGLFLLT 479 

Query: 480 KNPITNEVAIIJIVGQGDSILVRDWIKSKTILIDTGGRTO-FEQPEEWKQKTOQSNAKRTLI 538 

K+P+ NE+ +IjDVGQG+SI +RD GRTILID GG+ +++ ++W++K+ SHA+R+LI 
Sbjct: 480 KHPLENEITMLDVG(X3ESIFLRDVTOKTILIDViCMKAESYKKIKKWQEKMTTSm^ 539 

Query: 539 PYLKSRGISKIDDLVITHTDTDHMGDMEVISKHFKVARLITSSGSLTNSQYVKHLSKIGV 598 

PYLKSRG++KID L++T+TD +H+GD+ ++K F V ++ S SL ++V L 
Sbjct: 540 PYLKSRGVAKIDQLILTNTDKEHVGDLSErWKZVFHVGEILVSKDSLKQKEFVaELQATQT 599 

15 Query: 599 AVKSIEAGDKLAVMGSYLQVLYPWHKGDGKNNDSIVLYGHLLGKGFLFTGDLEEEGEKQL 658 

V+S+ G+ L + GS L+VL P GDG ++D++VLYG L K FLFTG+LEE+GEK L 
Sbjct: 600 KVRSMIVGENLPIFGSQLEVLSPRKMGDGGHDDTLVLYGKFLDKQFLFTGNLEEKGEKDL 659 

Query: 659 LEAYPNLSVDILKaGHHGSKGSSSLSFLKKLSPSWLVSAGKNNR 703 
20 L+ YP+Ij V++LKA H6+K SSS +FL+KL P + L+S GK+NR 

Sbjct: 660 LKHYPDLKMWLKaSQHGNKKSSSPAFLEKriKPEIiTLISVGKSNR 704 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6713> which encodes the amino acid 
sequence <SEQ ID 6714>. Analysis of this protein sequence reveals the following: 

25 Possible site: 29 

>» Seems to have an iincleavable N-term signal seq 



INTEGRAL 


Likelihood 




■10, 


.19 


Transmembrane 


394 


- 410 


( 380 


- 422) 


INTEGRAL 


Likelihood 




-8, 


.28 


Transmembrane 


54 


- 70 


( 52 


- 72) 


INTEGRAL 


Likelihood 




-6, 


.32 


Transmembrane 


356 


- 372 


( 355 


- 377) 


INTEGRAL 


Likelihood 




-4 


.73 


Transmembrane 


8 


- 24 


( 7 


- 25) 


INTEGRAL 


Likelihood 




-4, 


.30 


Transmembrane 


30 


- 46 


( 29 


- 50) 


INTEGRAL 


Likelihood 




-3 , 


.88 


Transmembrane 


249 


- 265 


( 249 


- 267) 


INTEGRAL 


Likelihood 




-3 , 


.40 


Transmembrane 


457 


- 483 


( 465 


- 484) 


INTEGRAL 


Likelihood 




-2, 


.39 


Transmeinbrane 


325 


- 341 


( 325 


- 347) 


INTEGRAL 


Likelihood 




-0, 


.43 


Transmembrane 


441 


- 457 


( 441 


- 458) 



40 



Final Results 

bacterial membrane -• 

bacterial outside -■ 
bacterial cytoplasm -■ 



- Certaintyi=0. 5076 (Affirmative) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



45 



The protein has homology with the following sequences in the databases: 

>GP:AAC23742 GB:AF052208 competence protein [Streptococcus pneumoniae] 
Identities = 311/706 (44%) , Positives = 458/706 (64%) , Gaps = 10/706 (1%) 

Query: 5 WTKLVPLSKIQFAFLILVFFYQIHSPSWLTFL-LSLSLICLLVKRLSKK--EFLGVFAIL 61 

W K +1 +FL+L +Y I S S+L L L+CL ++ K + L + I 

Sbjct: 4 WIKNFSIPLIYLSFLLLWLYYAIFSASYLALLGFVFLLVCLFIQFPWKSA6KVLIICGIF 63 



50 Query: 62 SFCALFLLYQKQQLVQKLEIQPVQITSVaLVPDSIRINGDQLAVLGRHGKHSYQLFYRLK 121 

F +F +Q+ Q Q L + V ++PD++++NGD L+ G+ +Q++Y+L+ 

Sbjct: 64 GFWFVFQ^mQQSQRSQNLADS---VERTOILPDTVKVNGDSLSFRGKaI)GRIFQVYYiCLQ 120 



Query: 122 SQAEAQLFKKEHRWLVNlHAKVTLEKAEEVRNFRGFIireQTFLTYQGIYRIGKVEQIEQLEV 181 
55 S+ E + P+ + + L + E RNF GFNYQ +L QGIY+ +++I+ L+ 

Sbjct: 121 SEEEKEAFQALTDLHEIGLEGKLSEPEGQRNFGGFNYQAYLKTQGIYQTLNIKKIQSLQK 180 

Query: 182 ISPESICDYLSSLRRRAIVHCQQHFPRPMSHYLTGLLFGYLDKSPGEMTDYYSQLGIIHL 241 

I I + LSSLRR+A+V + HFP PM -i-Y+TGLL G+LD F EM + YS LGIIHL 
60 Sbjct: 181 IGSWDIGENLSSLRRKAWWIKTHFPDPMRNYMTGLLLGHLDTDFEEMNELYSSLGIIHL 240 



65 



Query: 242 FALSGMQVGFFLTCFRRVLLLLAVPLEWIKWIELPFACFYAALTGYSISVIRSLVQSQLR 301 

FALSGMQVGFF+ F+++LL L + E +KW+ PF+ YA LTG+S SVIRSL+Q L 
Sbjct: 241 FALSGMQVGFFMNGFKKIiLRLGLTQEKLKWLTYPFSLIYAGLTGFSASVIRSLLQKLLA 300 
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Query: 302 HLGIKGIJ3NIiACTFLLVFLm)MJFIJyiTOGGVLTFSYAFLLT\A7TVEELSGaKRQLVQVLT 361 

G+KGLDN A T L++F+ +F T GGVL+ +YAF+LT+ + +E G K + L 
Sbjct: 301 QHGVKGLDNCALTVLVLFIVMPNFFFTAGGVLSCAYAFILTMPS-KEGEGLKAVASESLV 359 

5 Query: 362 ISLGILPFLLFYFSSFNPMSMVIjTGLLSYLFDLFILPriLCLVFCLSPLVTVSICNHLFIL 421 

ISLGILP L FYF+ F P S++LT + S+LPDL LPLL ++F LS L V N +P 
Sbjct: 360 ISLGILPILSFYFAEFQPWSILLTFVFSFLFDLTPLPLLSILFVLSFLYPVIQIiNFIFEW 419 

Query: 422 LEKVIQFLGNTFNSSLVFGSPTSWHLLILVISFAIFYDYRQ-VRQRVITCGLVIALTLLS 480 
10 LE +1+ + + LVFG P +W L++L+IS A+ YD R+ +++ + C L+ Ii LL+ 

Sbjct: 420 LEGIIRLVSQVTSRPLVFGQEinmLILLLISIiaLVYDLRKNIKiaiTVLCLLITGIi^^ 479 

Query: 481 VKYPLTISIEVTFIDIGQGDSILVREWTGKNLLIDVGGR-PFFSSKEHWRRGHHVaNAQK^^ 539 
K+PL NE+T +D+GQG+SI +R+ TGK +LIDVGG+ + + W+ +1IAQ++L 
15 Sbjct: 480 -KHPLENEITMIOTGQGESIFLRDWGKTILIDVGGKAESYKKIKKWQEKMTTSNAQRSL 538 

Query: 540 IPYLKSRGIHTIDQLLVTHADTDHMGDIEWAKAIRIKEILTSQGSLSHPSPVRRLRRLK 599 

IPYLKSRG+ IDQL++T+ D +H+GD+ + KA + EIL S+ SL FV L+ + 
Sbjct: 539 IPYLKSRGVAKIDQLILXNTDKEHVGDLSEMTKAFHVGEILVSKDSLKQKEFVAELQATQ 598 

20 

Query: 600 CHVRVIJWSDQLPIMGSVLQVLYPWQIflDGKiraDSLVLYGRLLNRTFLFTGDLEffi^ 659 

VR + G+ LPI GS L+VD P ++GDG ++D+LVLYG+ L++ FLFTG+LE++GE + 
Sbjct: 599 TKVRSMIVGENLPIFGSQLEVLSPRKMGDGGHDDTLVLYGKFLDKQFLFTGNLEEKGEKD 658 

25 Query: 660 IIKRYPQLRVDYLKAGHHGSNTSSSAAFLDHIQPKVAFISAGKNNR 705 

++K YP L+V+ LKA HG+ SSS AFL+ ++P++ IS GK+NR 
Sbjct: 659 LLKHYPDLKVOTLKASQHGlinCKSSSPAFIiEKLKPELTLISVGKSNR 704 

An alignment of the GAS and GBS proteins is shown below. 

30 Identities = 346/743 (46%), Positives = 491/743 (65%), Gaps = 3/743 (0%) 

Query: 5 TKYFPIiKPIYLALLVFQIYIiLVFSWTMLGCAFIiLFSFIFLIYQYDRETIFK.TIAIVIFFL 64 

TK PL I A L+ + + S' + L L L+ + ++ AI+ F 

Sbjct: 6 TKLVPLSKIQFAFLILVFFYQIHSPSWLTFLLSLSLICLLVKRLSKKEFLGVFAIIiSFCa 65 

35 

Query: 65 FYFLWQOT^NMm?QYQRVE^raISQIKVRIDTISING0VLSFQArlASGl!^TQAFYTLK^ 124 

+ L+Q + + + P 1+ + + D+I INGD L+ ++YQ FY LK+++E 

Sbjct: 66 LFLLYQKQQLVQKLEIQPVQITSVRLVPDSIRINGDQLAVLGRHGKHSYQIiFYRLKSQaE 125 

40 Query: 125 KDYFQNLDNNIMIIADIKLEEAEERRHFNGFDYRQYLKRHGIYRIAKVTKIKQIRIjFQHR 184 

F+ +++ A + LE+AEE R+F GF+Y+ +L GIYRI KV +I+Q+ + 

Sbjct: 126 AQLFKKEHRWLVMHAKVTLEKBEEVRNFKGENYQTFLTYQGIYRIGKVEQIEQLEVISPE 185 

Query: 185 SFFALMSKWRRSAIV-ISQTFPNPMRHYMSGIiLFGYLDKTFDDMSDLYSSLGIIHIiFALS 243 

45 S +S RR AIV Q FP PM HY++GLLFGYLDK+F +M+D YS LGIIHLFALS 

Sbjct: 186 SICDYLSSLRRRAIVHCQQHFPRPMSHYLTGLLFGYLDKSFGEMTDYYSQLGIIHLFALS 245 

Query: 244 GMQVGFFLGIFRYICLRIGIiRIiDHVWLLQIPFSLIYAGLTGFSISVVEALIQSLLSHSGV 303 
GMQVGFFL FR + L + + L+ + +++PF+ YA LTG+SISV+R+L+QS L H G+ 
50 Sbjct: 246 GMQVGFFLTCFRRVLLLLAVPIiEWIKWIELPFACFYAALTGYSISVIRSLVQSQLRHLGI 305 

Query: 304 KKDENFALCLLICLISLPHSLLTTGGVLSFAYAFILTMTSFDHFSSIKKVAIESLTVSVG 363 

K +N A L+ + H L+T GGVL+F+YAF+LT+ + + S K+ ++ LT+S+G 
Sbjct: 306 KGII)mACTFIiIiVFLVOyaFLl)mraGVLTFSYAFLLTVVTVEELSGAKRQLVQ 365 

55 

Query: 364 ILPILTYYFSGFQPISIILTALLSFAFDIIFLPLLTVIFVLSPIVKLSCINSLFEILEVL 423 

ILP h +YFS F P+S++LT I1LS+ FD+ LPLL ++F LSP+V +S N LF +LE + 
Sbjct: 366 ILPFLLFYFSSENPMSMVLTGLLSYLFDLPILPLLCLVFCLSPLVTVSICNHLFIIiLEKV 425 

60 Query: 424 LKWTGQLFPRPLIFGKPSLFLLIVMIIILGLLYDYYHSKC-FRYCSLLIIFTLFFITKNP 482 

+++ G F L+FG P+ + L++++I + YDY + C L+I TL + K P 

Sbjct: 426 IQFLGNTFNSSLVFGSPTSWHLLILVISFAIFYDYRQVRQRVITCGLVIALTLLSV-KYP 484 

Query: 483 ITNEmiLDVGQGDSILVRDWLGKTILIDTGGRWFEQPEEWKQKVNQSNAKRTLIPYLK 542 
65 +TNEV +D+GQGDSILVR+W GK +LID GGR F E W++ + +NA++TLIPYLK 

Sbjct: 485 LTNEVTFIDIGQGDSILVREWTGKNIilDVGGRPFFSSKEHWRRGHHVAKAQKTLIPYLK 544 
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Query: 


543 


SRGISKIDDLVITHTDTDHMGDMEVISKHFKWARLITSS'GSLTNSQYVKHLSKIGVAVKS 


602 








SRGI ID L++TH DTDHMGD+EV++K ++ ++TS GSL++ +V+ L ++ V+ 






Sb j Ct : 


545 


SRGIHTIDQLLVTHADTDHMGDIEWAKAIRIKEILTSQGSLSHPSFVRRLRRLKCHWV 


604 


5 


Query: 


603 


lEAGDKLAVMGSYLQVLYPWHKGDGKNiroSIVLYGHLLGKGFLFTGDLEEEGEKQLLEAY 


662 








+ AGD+Ii +MGS LQVLYPW GDGKNNDS+VLYG hh + FLFTGDLE+EGE ++++ Y 






Sb j ct : 


605 


LRAGDQLPIMGSVLQVLYPWQLGDGKNOTJSLVLYGRLIMITFLFTGDLEKEGENEIIKRY 


664 


10 


Query: 


663 


PNLSVDILKAGHHGSKGSSSLSFLKKLSPSWLVSAGKNNRYQHPHQETLQRFQKIKSKI 


722 






P L VD LKAGHHGS SSS +FL + P V +SftGKNNRYQHPH+ETL R + + 






Sbjct: 


665 


PQLRVDYLKRGHHGSOTSSSaAEI^HIQProfflf'ISAGKNNRYQHPHRETLARLEDRQITY 


724 




Query: 


723 


FRTDQSGTIRLTGWWKWHIQTVR 745 










+RTD 6 IRLTG WH++TVR 




15 


Sbjct: 


725 


YRTDTQGAIRLTGRTSWHLETVR 747 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2174 

20 A DNA sequence (GBSx2291) was identified in S.agalactiae <SEQ ID 6715> which encodes the amino 
acid sequence <SEQ ID 6716>. This protein is predicted to be competence protein (comEA). Analysis of 
this protein sequence reveals the following: 

Possible site: 38 

>>> Seems to have an tincleavable N-tertn signal seq 
25 INTEGRAL Likelihood = -3.77 Transmembrane 18 - 34 ( 14 - 36) 



Final Results 

bacterial membrane — Certainty=0. 2508 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC23741 GB:AF052208 conpetence protein [Streptococcus pneumoniae] 
Identities = 96/217 (44%) , Positives = 138/217 (63%) , Gaps = 4/217 (1%) 

35 

Query: 3 EIVLEKIKSHKWETTGIIVGLLLFGILGLNHFG-THHKEDNtNINLEK-KVSTITEKKVP 60 

E ++EKIK +K +GLL+ G L T KE NL + ++EK+V 

Sbjct: 2 EAIIEKIKEYKIIVICTGLGLLVGGFFLLKPAPQTPVKETNLQAEVAAVSKDLVSEKEVN 61 

40 Query: 61 MlSHVKDKVSNQVTVDVKGAVNHPGVYSLPSQSRVTDAIKRAGGLSNLADSKSVNIjaQiaii 120 

+ + +TVDVKGAV PG+Y LP SR+ nA+++AGGL+ ADSKS+NLAQK+ 
Sbjct: 62 KEEKEEPLEQDLITVDVKGAVKSPGIYDLPVGSRINDAVQKAGGLTEQADSKSLNLAQKV 121 

Query: 121 QDETVIYVAQKGEKITWEEEKflNNIATQGNSKGKINLNKADLSSLQTISGVGAKRAQDI 180 
45 DE ++YV KGE+ V ++ A+ + + K+NIiNKA L L+ + G+G KRAQDI 

Sbjct: 122 SDEALVYVPTKGEE--AVSQQTGLGTASSISKEKKVNLNKASLEELKQVKiGLGGKRAQDI 179 

Query: 181 LDYRDSQGGFKTIDDLKNVSGIGEKTLEKLRQDVTID 217 
+D+R++ G FK++D+LK VSGIG KT+EKL+ VT+D 
50 Sbjct: 180 IDHREANGKFKSVDELKKVSGIGGKTIEKLKDYVTVD 216 

A related DNA sequence was identified in S. pyogenes <SEQ ID 6717> which encodes the amino acid 
sequence <SEQ ID 6718>. Analysis of this protein sequence reveals the following: 

Possible site: 36 



55 



>>> Seems to have no N-terminal signal sequence 

INTEGRAL, Likelihood = -9.61 Transmembrane 22 - 38 ( 16 - 42) 



Final Results 
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bacterial membrane Certaiiityi=0 . 4843 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the databases: 

>GP:AAC23741 GB:AF052208 competence protein [Streptococcus pneumoniae] 
Identities = 82/179 (45%) , Positives = 124/179 (68%) , Gaps = 4/179 (2%) 

Query: 42 NRQSKaAVPALREISP\nKXMVSEEKKEIQEDSSILVDLKXaVQKEGVYKLTASSRVRDVI 101 
10 N Q++ A + +++ K+ EEK+E E I VD+ia3AV+ G+Y L SR+ D + 

Sbjct: 42 NLQREVaaVS-KDLVSEKEVNKEEKEEPLEQDLITVDVKGAVKSPGITOLPVGSRINDAV 100 

Query: 102 ELAGGLTSEADKHAINFAEKLTDEQWYVPKQGEEISVLPRSLVSGKKETASKDQSKVHI 161 
+ AGGLT +AD ++N A+K++DE +VYVP +GEE + + G + SK++ KV++ 
15 Sbjct: 101 QKftGGLTEQftDSKSIJiniAQKVSDE2iLVYVPTKBEE--AVSQQT6LGTASSISKEK-KVm 157 

Query: 162 NKASLEELQHIPGIGRKRAQDIIDMRDKlXSSFKftlBDLRQVSGIGEKTIiEKLKDDIFIiD 220 

NKaSLEEL+ + G+G KRAQDIID R+ G FK++++L++VSGIG KT+EKLKD + +D 
Sbjct: 158 NKASLEELKQVKGLGGKRAQDIIDHREANGKFKSVDELKKySGIGGKTIEKLKDYVTVD 216 

20 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 81/166 (48%) , Positives = 111/166 (66%) , Gaps = 10/166 (6%) 

Query: 62 ISHVKDKVSNQ VTVDVKGRVNHPGVYSLPSQSRVTDAIKRAGGLSNLADSK 112 

25 , IS VK +VS + + VD+KiGAV GVy L + SRV D 1+ AGGri++ AD 

Sbjct: 55 ISPVKQQVSEEKKEIQEDSSILVDLKGAVQKEGVYKLTASSRVRDVIELAGGLTSEADKH 114 

Query: 113 SVrnJ^QKLQDETVIYVAQKGEKITVVEEEKAKINIA-TQGNSKGKINXJ!!^ 171 
++N A+KL DE V+yV ++GE+I+V+ + T + K+++NKA L LQ I G 

30 Sbjct: 115 AINFAEKLTDEQWYVPKQGEEISVLPRSLVSGKKETASKDQSKVHINKASLEELQHIPG 174 

Query: 172 VGAKRAQDILDYRDSQGGFKTIDDLKNVSGIGEKTLEKLRQDVTID 217 

+GAKRAQDI+D RD GGFK ++DL+ VSGIGEKTLEKL+ D+ +D 
Sbjct: 175 IGAKRAQDIIDMRDKLGGPKALEDLRQVSGIGEKTLEKLKDDIFLD 220 

35 

A related GBS gene <SEQ ID 8989> and protein <SEQ ID 8990> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: 5.70 
40 GvH: Signal Score (-7.5): -2.58 

Possible site: 38 
>>> Seems to have an uncleavable N-term signal seg 
ALOM program count: 1 value: -3.77 threshold: 0.0 

INTEGRAL Likelihood = -3.77 Transmembrane 18 - 34 ( 14 - 36) 
45 PERIPHERAL Likelihood = 10.40 73 

modified ALOM score: 1.25 

*** Reasoning Step: 3 

50 Final Results 

bacterial membrane — Certainty=0. 2508 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the databases: 

44.3/64.1% over 215aa 

Streptococcus 

pneumoniae 

GP 1 3211753 I competence protein Insert characterized 



60 



ORF01930(304 - 951 of 1014) 

GP|3211753|gb|AAC23741.l| |AF052208(1 - 216 of 216) coinpetence protein {Streptococcus 
pneumoniae} 
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%Match =25.0 

%Identity =44.2 %Similarity =64.1 
• Matches = 96 Mismatches = 75 Conservative Siib.s = 43 

5 90 120 150 180 210 240 270 300 

DDGKraNPLTYlYRLPLAIIAIVLLVLTLIFSYIiftSFVWDPQKin^K*GLHGNyLLFSK*F™^ 



330 360 390 417 447 474 504 534 

MFEIVLEKIKSHKIffiTTGIIVGLLLFGILGIJraFG-THHKEDNIiNINLEK-KVSTITEKKVPMISH\n^ 

10 I =1 =111= 1=1 I II II = ==l|:| = = =11111 

MEAIIEKIKEYKIIVICTGLGLLVGGFFLLKPAPQTPVKETra:.QAEVaAVSKDLVSEKEVNKEEKEEPLEQDLITVDVK 

10 20 30 40 50 60 70 

564 594 624 654 684 714 744 774 

1 5 GAVNHPGVySLPSQSROTDAIKRAGGLSNLM)SKSVNIAQKLQDEWIYVAQKGEKITVVXEEKANNIATQGNS^^ 

III Ihl II II: lh::|li|: llllhllllj: || ::|| |||: | :: j: : : |:|| 

GAVKSPGIYDLPVGSRIimVQKftGGLTEQADSKSrjn:iaQKVSDEALVYVPTKGEE--AVSQQra 

90 ' 100 110 120 130 140 150 

20 804 834 864 894 924 954 984 1014 

NKADI.SSICTISGVGAKRAQDILDYRDSQGGFKTIDDLKNVSGIGEKTIliEKLRQDVTID*VFSSKTYLFSIVGLPNIiLTS 

III I h : hi lllllhhh: I lh:|:|r lllll Ihllh Ihl 
NKASLEELKQVKGLGGKRAQDIIDHREAiraKFKSVDELKKVSGIGGKTIEKLKDYVTVD 

170 180 190 200 210 

25 SEQ ID 8990 (GBS129) was expressed in E.coli as a GST-fUsion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 41 (lane 4; MW 43.8kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2175 

30 A DNA sequence (GBSx2292) was identified in S.agalactiae <SEQ ID 6719> which encodes the amino 
acid sequence <SEQ ID 6720>. Analysis of this protein sequence reveals the following: 



35 



40 



45 



Possible site: 54 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-14.01 Transmembrane 215 - 231 ( 208 - 240) 



Final Results 

bacterial tnembrane Certainty=0 . 6604 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CRB12793 GB:Z99109 similar to 1-acylglycerol- 3 -phosphate 
O-acyltransf erase [Bacillus stibtilis] 
Identities = 66/200 (33%) , Positives = 111/200 (55%) , Gaps = 10/200 (5%) 

Query: 3 YTYLRTLVMFLIWVaNGNAHYHNEDKVn^KDDENYILVRPHRTFWDPraiAFAaRPKQF^ 62 

y + + ++ + G Y+ E+ L D +++ H+D + + PQ + 

Sbjct: 2 YKFCSiNALKVILSLRGGVK\miKEN--LPADSGFVIACTHSGWVDVITLGVGILPYQIHy 59 

50 Query: 63 MAKKELFTNRLFGWWIKMCGAFPIDREKPGQDAIRYPVKMLKNSNRSLVMFPSGSRHSKD 122 

MAKKELF N+ G ++K AFE+DRE PG +1+ P+K+LK + +FPSG4-R S+D 

Sbjct: 60 MAKKELFQNKWIGSFLKKIHAFPVDRENPGPSSIKTPIKLLK-EGEIVGIFPSGTRTSED 118 

Query: 123 V--KGGVAVIAKMAKVRIMPAAYRGPMVFKNLLKBHRVD^mFGNPI^VSDIKRMnA-EGI 179 
55 V K G lA+M K ++PAAY+GP KLK+++GP++D + + E + 

Sbjct: 119 VPLKRGAVTIAQMGKAPLVPAAYQGPSSGKELFKRGKMKLIIGEPLHQADFAHLPSKERL 178 

Query: 180 A EVSRRIQEEFDRLDR 195 

A +++RI+E ++LD+ 

60 Sbjct: 179 AAMTEALNQRIKELEmLDQ 198 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 6721> which encodes the amino acid 
sequence <SEQ ID 6722>. Analysis of this protein sequence reveals the following: 

Possible site: 49 
5 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-11.83 Transmembrane 241 - 257 ( 234 - 266) 
INTEGRAL Likelihood = -4.41 Transmembrane 27 - 43 ( 26 - 44) 

Final Results 

10 bacterial membrane Certainty=0 . 5734 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

15 >GP:C3VB12793 GB:Z99109 similar to l-acylglycerol-3-phosphate 

0-acyltransferase [Bacillus subtilis] 
Identities = 59/198 (29%) , Positives = 104/198 (51%) , Gaps = 6/198 (3%) 

Query: 29 YAYLRGLWFLLWVVNGNAHYHHEEKMLDASENYILVAPHRTFWDPVYMAFAARPKQFIF 88 

20 Y + + +L + G Y+ E L A +++ H+D + + PQ + 

Sbjct: 2 YKFCANALKVILSLRGGVKVYNKEN--LPADSGFVIACTHSGWVDVITLGVGILPYQIHY 59 

Query: 89 MAKKELFANRLFAWWIKMCGAFPIDRDKPSPDAIRYPVNMLKKSNRSLLMFPSGSRHSQE 148 
MAKKELF N+ ++K AFP+DR+ P P +1+ P+ +LK+ + +FPSG+R S++ 
25 Sbjct: 60 MAKKELFQNKWIGSFLKKIHAFPVDRENPGPSSIKTPIKLLKE-GEIVGIFPSGTRTSED 118 

Query: 149 V--KBGVAVIAKLAKVKIMPAAYQ6PMSVKGLLAGERVDMTFGNPIDVSDIKRM-NDEGI 205 

V K G IA++ K ++PAAYQGP SKL +++GP++D + + E + 
Sbjct: 119 VPLKRGaVTIflO«GKAPLVPAAYQGPSSGKELFKKGKMKLIIGEPLHQaDEaHLPSKERL 178 

30 

Query: 206 AEVANRIQAEFDRIDDEL 223 

A + + ++++L 
Sbjct: 179 AAMTEALNQRIKELENKL 196 

35 An alignment of the GAS and GBS proteins is shown below. 

Identities = 186/244 (76%) , Positives = 212/244 (86%) 

Query: 1 MFYTYLRTLVMFLIWVANGNAHYHNEDKMLKDDENYILVAPHRTFWDFVYMAEAARP^^ 60 
+FY YLR LV+FL+WV NGNAHYH+E+KML ENYILVAPHRTPWDPVYMAPAARPKQF 
40 Sbjct: 27 VFYAYLRGLVVFLLWVVNGNAHYHHEEKMLIffiSENYILVAPHRTFVroPVYMAPAARPRQF 86 

Query: 61 IFMAKKELFTNRLFGWWIKMCGAFPIDREKPGQDAIRYPVKMLKNSNRSLVMFPSGSRHS 120 

IFMAKKELF NRLF WWIKMCGAFPIDR+KP DAIRYPV MLK SNRSL+MFPSGSRHS 
Sbjct: 87 IFMAKKELFAJ^FAWWIKMOSAFPIDRDKPSPDAIRYPVNMLKKSNRSLLMPPSGSRHS 146 

45 

Query: 121 KDVKGGVAVIAKMAKVRIMPAAYRGPMVPKOTiLKGHRVDMNFGNPIDVSDIKRMDAEG 180 

++VKjGGVAVIAK+AKV+IMPAAY+GPM K LL G RVDM FGNPIDVSDIKRM+ EGIA 
Sbjct: 147 QEVKGGVAVIAKIAKVKIMPAAYQGPMSVKGLXAGERVDMTFGNPIDVSDIKRMNDEGIA 206 

50 Query: 181 EVSRRIQEEFDRLDRENETYDDGKKLNPLTYIYRLPLAIIAIVLLVLTLIFSYLASFVWD 240 

EV+ RIQ EFDR+D E + GK NPLTY+YRLPL ++ +V+L+LT++FSY+ASFVW+ 
Sbjct: 207 EVANRIQAEFDRIDDELAPFQPGKARNPLTYLYRLPLGLVLVVVLLLTMLFSYIASFVWN 266 

Query: 241 PQKH 244 
55 P KH 

Sbjct: 267 PDKH 270 

SEQ ID 6720 (GBS171) was expressed in E.coli as a His-fiision product. SDS-PAGE analysis of total cell 
extract is shown in Figure 36 (lane 2; MW 25kDa). It was also expressed in E.coli as a GST-fiision 
60 product. SDS-PAGE analysis of total cell extract is shown in Figure 41 (lane 3; MW 49.8kDa). 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Exaaiple.2176 

A DNA sequence (GBSx2293) was identified in S.agalactiae <SEQ ID 6723> which encodes the amino 
5 acid sequence <SEQ ID 6724>. Analysis of this protein sequence reveals the foUowmg: 

Possible site: 48 

>» Seems to have no N-terminal signal sequence 

Pinal Results 

10 bacterial cytoplasm Certainty=0. 3268 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:CaB11810 GB. -299104 similar to hypothetical proteins [Bacillus subtilis] 

Identities = 113/244 (46%) , Positives = 173/244 (70%) , Gaps = 2/244 (0%) 

Query: 6 LKENERIDQLFSTDVKIIQNKEVFSYSIDSVLLSRFPKLP-SRGLIVDLCSGNGAVGLFA 64 

L ++ER+D L + D+KIIQ+ VF++S+D+VLLS + F +P +G IVDLC+GNG V L 
20 Sbjct: 4 LHDDERLDYLLAEDMKIIQSPTVFAFSLDAVLLSKFAYVPIQKGKIVDLCTGNGIVPLLL 63 



25 



50 



Query: 65 STKTNATIIEIELQESI^MMRSIKIJiJKLEKQVTMINDDLKNLIiDHVQRSNVDimC^^ 124 

ST++ A 1+ +E+QE L DMA RS++ NKL+ Q+ +I+DDLKN+ + + + D++- CNP 
Sbjct: 64 STRSKftDILGVEIQERLHDMAVRSVEYNKLDDQIQIIHDDLKtMPEKI/SHNRYIWOT 123 

Query: 125 PYFKaSETSKKNLSPHYLLflRHEITTMLREICQIAQHaLKTKGRIAMVHRPDRFLEIIDT 184 

PYFK + +++N++ H +ARHEI L ++ ++ LK G+ A+VHRP R LEI + 
Sbjct: 124 PYFKTPKQTEQNMNEHIiRIARHEIHCTI^IJVISVSSKLLRQGGKAAL^^ 183 



30 Query: 185 MRQFTOIAPKRIQFVYPKLGKDANMLLIEAIKDGSTEGMKILPPLVVHQDNGDYTETIFDI 244 

M+ + + PKR+QFVYPK GK+AN +L+E IK G + +KILPPL V+ + +YT+ I I 
Sbjct: 184 MKAYQIEPKRVQFVYPKQGKEftNTILVEGIKBGRPD-LKILPPLFVYDEQNEYTKEIRTI 242 

Query: 245 YFGE 248 

35 +G+ 

Sbjct: 243 LYGD 246 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6725> which encodes the amino acid 
sequence <SEQ ID 6726>. Analysis of this protein sequence reveals the following: 

40 Possible site: 48 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2183 (Affirmative) < suco 

45 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0.0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shovra below. 

Identities = 200/257 (77%) , Positives = 228/257 (87%) , Gaps = 3/257 (1%) 



Query: 1 MIDTILKENERIE)QLFSTDVKIIGlSIKEVFSYSIDSVriLSRFPKIiPSRGr.IVDLCSGNGAV 60 

MI ILKE ERIDQLFS+DV IIQNK+VFSYSIDSVLLSRFPK+PS+GLIVDLCSGNGAV 
Sbjct: 1 MIKAILKEGERIDQLFSSDVGIIQNKDVFSYSIDSVLLSRFPKMPSKGLIVDLCSGNGAV 60 



55 Query: 61 GLFASTKTNATIIEIELQESLADMAKRSIKINKLEKQVTMINDDLKNLLDHVQRSNVDLM 120 

GLFAST+T A 1+E+ELQE lADM +RSI+I1N+I1E QVTMI DDLKNLL+HV RS VDLM 
Sbjct: 61 GLFASTRTKAAIVEVEI^JERLADMGQRSIQLNQLEDQVTMICDDLKlffiLNHVPRSGVDm 120 



Query: 121 LCNPPYFKftSETSKlCISrLSPHYLLaRHEITTmREICQIAQHaLKTKBRIAMVH^ 180 
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LCNPPYFK+ E+SKKN+S HYLLARHE+TTNL EICQ+A+HALK+ GR+AMVHRPDRFLE 
Sbjct: 121 LCMPPYFKSHESSKKNVSEHYLIjiRHEVTTNLEEICQVARHaLKSNGRLAimmRPDR^^ 180 

Query: 181 IIDTMRQHTO^KRIQBVyPKLGKnANMLLIEAIKDGSTEGMKILPPIiVVHQDNGDYTET 240 
5 IID++R LAPKR+QFVYPKLGK iUSIMLLIEAIKDGS EGM ILPPLWH++NG+YT+ 

Sbjct: 181 IIDSLRMGLRPKRVQEVYPKLGKSMSMLLIEAIKrXSSIEGMTILPPLVVHK™ 240 

Query: 241 IFDIYFGENGK---SHD 254 
IF+IYFG K +HD 
10 Sbjct: 241 IFEIYFGftASKGKPNHD 257 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2177 

15 A DNA sequence (GBSx2294) was identified in S.agalactiae <SEQ ID 6727> which encodes the amino 
acid sequence <SEQ ID 6728>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

>» Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0. 1512 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certaintyta0.0000{Not Clear) < suco 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11811 GB:Z99104 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 40/82 (48%) , Positives = 63/82 (76%) 

Query: 7 YMYVLECSDGTLYTGYTTDVKRRIJmmXSKGaKYTRftRLPV^ 66 
30 + YV++C D + Y GYT D+ +R+ THN GRGAKYT+ R PV+L+++E+F++K+EaM+AE 

Sbjct: 7 FFYVVKCKnNSWYAGYm)iaKRVierHOTX3KlGaKYTKVRRPVELIFflESFSTK^^ 66 

Query: 67 ALFKQKTRQAKLTYIKQHKNEQ 88 
FK+ TR+ K YI++ +N + 
35 Sbjct: 67 YYFKKLTRKKKELYIEEKRNSK 88 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6729> which encodes the amino acid 
sequence <SEQ ID 6730>. Analysis of this protein sequence reveals the following: 

Possible site: 61 
40 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1838 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0.0000(Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 60/84 (71%) , Positives = 67/84 (79%) , Gaps = 1/84 (1%) 

50 Query: 6 AYMYVLECSDGTLYTGYTTDVKRRlJSmraTGKGAKyTRARLPVKLLYSEAFNSKQ 65 

AYMYVLEC D TLYTGYTTD+K+RL THN GKGAKYTR RLPV LLY E F+SK+ AM A 
Sbjct: 6 AYMYVLECVDKTLYTGYTTDLKKRLATHNAGKGAKYTRYRLPVSLLYYEVFDSKEAAMSA 65 

Query: 66 EALF-KQKTRQAKLTYIKQHKNEQ 88 
55 EaLF K+KTR KL YI H+ E+ 

Sbjct: 66 EALFKKRKTRSQKLAYIATHQKEK 89 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useM antigens for 
vaccines or diagnostics. 

Example 2178 

A DNA sequence (GBSx2295) was identified in S.agalactiae <SEQ ID 6731> which encodes the amino 
acid sequence <SEQ ID 6732>. This protein is predicted to be autoaggregation-mediating protein (deaD). 
Analysis of this protein sequence reveals the following: 

Possible site: 56 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2287 (Affirmative) < suco 

bacterial membrane — CertaintjfeO. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD20136 GB:AF091502 autoaggregation-mfidiating protein 
[Lactobacillus reuteri] 
Identities = 289/504 (57%) , Positives = 366/504 (72%) , Gaps = 18/504 (3%) 



Query: 


1 


MKFTELNLSQDILSAVEKAGFVEPSPIQEMTIPLALEGKDVIGQAQTGTGKTAAFGLPTL 


60 






MKF+EL LS +L A++++G+ E +PIQE TIP+ LEGKDVIGQAQTGTGKTAAFGLP + 




Sbjct: 


1 


MKFSELGLSDSLLKAIKRSGyEEATPIQEQTIPMVLEGKDVIGQAQTGTGKTAAPGLPII 


60 


Query: 


61 


NKIHTEDNTIQaLIIAPTRELAVQSQEELFRFGRDKGVKVRSVyGGSSIEKQIKALRSGA 


120 






+ TE+ IQA+II+PTRELA+Q+QEEL+R G+DK V+V+ VYGG+ I +QIK+L+ 




Sbjct: 


61 


ENVDTENENIQAIIISPTRELAIQTQEELYRLGKDKHVRVQWYGGADIRRQIKSLKQHP 


120 


Query: 


121 


HVVVGTPGRLLDLIKRKMjKLNHIETLILDEADEMLNMGFLEDIERIISRVPETRQTLLF 


180 






++VGTPGRL D I R +KL+HI+TL+LDEADEMLNMGFLEDIE+II P+ RQTLLF 




Sb j ct : 


121 


QILVGTPGRLRDHINRHTVKLDHIKTLVLDEADEMIiNMGFLEDlESIIKETPDDRQTLLF 


180 


(3uery: 


181 


SATMPDPIKRIGVKFMKDPEHVKIKATELTNVNVDQYYVRVKEIffiKFDTiyr^ 


240 






SATMP IKRIGV+FM DPE V+IKA ELT VDQYYVR ++ EKFD MTRL+DV P+ 




Sb j ct ; 


181 


SATMPPEIKRIGVQFMSDPETVRIKAKELTTDLVDQYYVRARDYEKFDIMTRLIDVQDPD 


240 


Query: 


241 


LSIVFGRTKRRVDELTRGLKLRGFRAEGIHGDLDQNKRLRVIRDFKNDHIDILVATDVAA 


300 






L+IVFGRTKRRVDEL++GL R6+ A GIHGDL Q+KR +++ FKN+ +DILVATDVAA 




Sb j ct : 


241 


LTIVFGRTKRRVDELSKGLIARGYNRAGIHGDLTQDKRSKIMWKFKNNELDILVATDV^ 


300 


Query: 


301 


RGLDISGVTHVYNYDIPQDPESYVHRIGRTGRAGKSGQSITFVSEISIEMGliliTIIENLTKK 


360 






RGLDISGVTHVYNYDIP DP+SYVHRIGRTGRAG G S+TFV+PNEM YL IE LT+ 




Sb j ct : 


301 


RGLDISGVlTmNYDIPSDPDSYVHRIGRTGRAGHHGVSLTFVTENEMDyiiHEIEKLTRV 


360 


Query: 


361 


RMTGMKPATASEAFQAKKKVRLKRIARDFED-QELVSK--FDKFKADALELATQYTPEEL 


417 






RM +KP TA EAF+ ++A F D EL+++ D+++ A +L + +L 




Sb j ct : 


361 


RMLPLKPPTAEEAFKG QVASAENDIDELIAQDSTDRYEEaAEKLLETHNATDL 


413 


Query: 


418 


ALYVLSLTVQDPESLPEVEITREKPLPFKPSGGGFKGKGGRGNGRGGD--RRRNDRGDRR 


475 






+L+ ++ S V+IT E+PLP + G R N GG+ RR+N R + 




Sbjct: 


414 


VRACiUSINMTKERASEVFVKITPERPLPRRNKRN- -NRNGNRNNSHGGNHYRRKNFRRHQH 


471 


Query: 


476 


GNRDRDDRG SRCDFKRRDDK 495 








G+ D+ G SR F R K 




Sbjct: 


472 


GSHRJSTONHGKSHSSRHSFNIRHRK 495 





A related DNA sequence was identified in S.pyogenes <SEQ ID 6733> which encodes the amino acid 
sequence <SEQ ID 6734>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>» Seems to have no N-terminal signal sequence 
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Final Results. 

bacterial cytoplasm Certainty=0 . 1108 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

5 

An alignment of the GAS and GBS proteins is showTi below. 

Identities = 430/545 (78%) , Positives = 463/545 (84%) , Gaps = 24/545 (4%) 

Query: 1 MKFTELNLSQDILSAVEKAGFVEPSPIQEMTIPLALEGKDVIGQAQTGTGKTAAFGLPTL 50 
10 +KFTE NLSQDI SAV AGF + SPIQEMTIPLALEGKDVIGQAQTGTGKTAAFGLPTL 

Sbjct: 1 LKFTEENLSQDIQSAVOTAGFEKaSPIQEMTIPLftLEGKDVIGQAQTGTGKTAAFGLPTL 60 

Query: 51 NKIHTEDNTIQaLIIAPTRELRVQSQEELFRFGRDKGVKVRSVYGGSSIEKQIKALRSGA 120 
NKI T +N IC3AL+IAPTREIAVQSQEELFRFGR+KGVKVRSVYGGSSIEKQIKaL+SGA 
15 Sbjct: 61 NKIRTNENIIQALVIAPTRELAVQSQEELFRFGREKGVKVRSVYGGSSIEKQIKALKSGA 120 

Query: 121 HVWGTPGRLLDLIKRKALKLNHIETLILDEADEMLNMGFLEDIEAIISRVPETRQTLLF 180 

H+WGTPGRLLDLIKRKAL L+H+ETLILDEADEMUSIMGFLEDIEAIISRVP RQTLLF 
Sbjct: 121 HIWGTPGRLLDLIKRKALILDHVETLILDEADEMLNMGFLEDIEAIISRVPADRQTLLF 180 

20 

Query: 181 SATMPDPIKRIGVKFMKDPEHVKIKATELTNVNVDQYYVRVKENEKFDTMTRLMDVDQPE 240 

SATMP PIK+IGVKFMKDPEHV+IK ELTNVNVDQYYVRVKE EKFDTMTRLMDV+QPE 
Sbjct: 181 SATMPAPIKQIGVKFMKDPEHVQIKNKELTNVNVDQYYVRVKEQEKFDTMTRLMDVNQPE 240 

25 Query: 241 LSIVFGRTKRRVDELTRGLKLRGFRAEGIHGDLDQNKRLRVIRDFKNDHIDILVATDVAA 300 

LSIVFGRTKRRVDE+TRGLKLRGFRAEGIHGDLDQNKRIiRVIRDFKND IDILVATDVAA 
Sbjct: 241 LSIVFGRTKRRVDEITRGLKLRGFRAEGIHGDLDQNKRLRVIRDFKNDQIDILVATDVAA 300 

Query: 301 RGLDISGVTHVYNYDIPQDPESYVHRIGRTGRAGKSGQSITFVSENEMGYLTIIENLTKK 360 
30 RGIiDISGVTHVYNYDI QDPESYVHRIGRTGRAGKSG+SITFySENEMGYL++IENLTKK 

Sbjct: 301 RGIiDISGVTHVYNYDITQDPESVVHRIGRTGRAGKSGESITFVSH(IEMGYLSMIENLTKK 360 

Query: 361 RMTGMKPATASEAFQAMOCVALKRIARDFEDQELVSKFDKFKADALELATQYTPEELALY 420 
+M ++PATA EAFQAKKKVALK+I RDF D+ + S FDKFK DA++I1A ++TPEELALY 
35 Sbjct: 361 QMKPLRPATAEEAFQAKKICVALKKIERDFADETIRSNFDKFKGDAVQLaaEFTPEELAL 420 

Query: 421 VLSLTVQDPESLPEVEITREKPLPFKPSGGGF---KGKiGGR6--NGRGGDRRRNDRGDR- 474 

+LSLTVQDP+SLPEVEI REKPLPFK GGG GKGGRG N GDRR RGDR 

Sbjct: 421 ILSLTVQDPDSLPEVEIAREKPLPFKYVGGGHGNKNGKGGRGRDNRNRGDRRGGYRGDRN 480 

40 

Query: 475 RGNRDRDDRGSRCDFKRRDDKFKKDNRRQENKKPHKNTSSEKQTGFVI 522 

R RD D DFKR+ + KD +E K SS K TGFVI 

Sbjct: 481 RDERDGDRRRQKRDKRDGHDGSGNRDFKRKSKRNSKDPFNKEKK SSAKNTGFVI 534 

45 Query: 523 RNKGD 527 

R+KG+ 

Sbjct: 535 RHKGE 539 

A related GBS gene <SEQ ID 8991> and protein <SEQ ID 8992> were also identified. Analysis of this 
50 protein sequence reveals the following: 

RGD motif 471-473 

The protein has homology with the following sequences in the databases: 

58.9/74.7% over 494aa 

Lactobacillus reuteri 

55 GP 1 4409804 1 autoaggregation-mediating protein Insert characterized 

ORF01926 (301 - 1785 of 2184) 

GP|4409804|gb|AAD20136.l| |AF091502(1 - 495 of 497) autoaggregation-mediating protein 
{Lactobacillus reuteri} 
60 %Match =37.3 

%Identity =58.8 %Similarity =74.6 

Matches = 290 Mismatches = 118 Conservative Sub.s = 78 

42 72 102 132 162 192 222 252 
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IRHYITKEIPSEAAVAF*IDKL*TLLLYRWWVFIAFFLFSEATNRTSNL*KRVIY*IDLILYLFTraCOT 

282 312 342 372 402 432 ■ 462 492 

Kt3S*GSF^UiSFRKEKHLKFTELNLSQDILSAVEKAGFVEPSpiQE^^^IPLALEGKDVIGQAQTGTGKTi^ 

5 . :||:|| II =1 |:: = :|: 1 :||||.|||: I I I I I I I I I I I I I I I I I I I I I I I = 

MKFSEIXSLSDSLLKAIKRSGYEEATPIQEQTIPMVLEGKDVIGQAQTGTGKTAAFGLPIIEaiJVD 
10 20 30 40 50 60 

522 552 582 612 642 672 702 732 

10 TEDOTIQALIIAPTREmVQSQEELFRFGRDKGVKVRSVYGGSSIEKQIKALRSGftHVWGTPGRLLDLIKRKALKIiNHI 

11= ll|:||:||lll|:|:|ll|:|:hll hh lllh I :|||:|: : = :|llllll I 1 I :l|:|l 
TENPNIQAI I ISPTREIAIQTQEELYRLGKDKHVRVQWYGGADIRRQIKSLKQHPQILVGTPGRLRDHIiniHTVKIiDHI 
80 90 100 110 120 130 140 

15 762 792 822 852 882 912 942 972 

ETLILDEADEMLNMGFLEDIEAIISRVPETRQTLLFSATMPDPIKRIGWFMKDPEHVKIKATELTNVNVDQYYTO 



20 



KTIiVLDEaDEMLNMGFLEDIESIIKETPDDRQTLLFSATMPPEIKRIGVQFMSDPETVRIKAKELTTDLVDQYYVRJ^ 
160 170 180 190 200 210 220 



1002 1032 1062 1092 1122 1152 1182 1212 

EKFDTMTRLM0VDQPELSI\W3RTK3«VDELTRGLKIiRGFRaEGIHGDIjDQNKR^ 

nil IIIMI |:|:|lllllllllllb:|| Ih I llllll hll : : • llh = I I I II I I I I I I I I I 
EKFDimRLIDVQDPDLTIVFGRTKRRVDELSKGLIARGYNAAGIHGDLTQDKRSKIMWKFKK^ 
25 240 250 260 270 280 290 300 

1242 1272 1302 1332 1362 1392 1422 1452 

ISGVTHVYIKDIPQDPESYVHRIGRTGRAGKSGQSITFVSENEMGYLTIIENLTKKRMTGMKPATASEAFQA^^ 

Illllllllllll Ihlllllllllllll I hllhllll II II Ih II :|l II III I :|l 
30 ISGVTHVymDIPSDPDSYVHRIGRTGRAGHHGVSLTFVTPNEMDYIiHEIEKLTRVRMIiPLKPPTAEEAF- -KGQVA- - - 

320 330 340 350 360 370 

1479 1503 1533 1563 1593 1623 1653 1683 

IARDFED-QELVSK--FDKPKADALELATQYTPEELALYVt,SLTVQDPESLPEVEITREKPLPFKPSGGGFKGKGGRGNG 
35 1 I ||::: | : : : | =1 : :| =1: :: | hU hill = 111 

- -SAFNDI0ELIAODSTDRyEEflAEKLLETH^IaTDLVAaLIJJSm'KEAASEVPVKITPERPLPRRl^^ -RNNS 
390 400 410 420 430 440 450 

1707 1737 1755 1785 1815 1845 1875 1905 

40 RGGD- -RRKHDRGDRRGNRDRDDRG SRCDPKRRDDKPKKDKRRQENKKPHKI!ITSSEKQTGFVIRNKGDK*EDYERG 

Ih Ihl I : |: |: I II I I 1 

HGGNHYRRKNFRRHQHGSHRMDNHGKSHSSRHSENIRHRKEN 
470 480 490 

45 There is also homology to SEQ ID 4454. 

SEQ ID 8992 (GBS307) was expressed in E.coU as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 56 (lane 7; MW 62kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 61 (lane 2; MW 86.7kDa). 

The GBS307-GST fusion product was purified (Figure 208, lane 9; Figure 225, lane 10-11) and used to 
50 immunise mice. The resulting antiserum was used for FACS (Figure 272), which confirmed that the protein 
is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and flieir epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2179 

55 A DNA sequence (GBSx2296) was identified in S.agalactiae <SEQ ID 6735> which encodes the amino 
acid sequence <SEQ ID 6736>. This protein is predicted to be outer membrane protein (yaeC). Analysis of 
this protein sequence reveals the following: 
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Possible site: 19 

»,> May be a lipoprotein 

Pinal Results. 

5 bacterial membrane — Certainty^O . 0000 (Wot Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm. — Certaintys=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

10 >GP:CAB73036 GB:AL139076 putative periplasmic protein [Campylobacter 

jejuni] 

Identities = 89/237 (37%) , Positives = 132/237 (55%) , Gaps = 3/237 (1%) 

Query: 40 IWATYSKPTSTFLDLVKDNVKEKGTrLKWWSDYIQANIALEINKEHDANLLQHEFFMS 99 
15 IT+ P + L+L+KD+ K KGY LK+V SDYI N ALE KE DANL QH+ F+ 

Sbjct: 23 ITIGATPNPPGSrOlBLMKDDFKNKBYELKIVEFSDyiLPNRAriEEKELDflNL 82 

Query: 100 IFNKENIX3HLVSITPIYHSLAGFyGQH]:.KNIAELKI(GAKVAIPSDPMIMTRALLLLQEKK 159 
+N + +L++ TP+ + G Y + +KN+ LK+GA+VAIP+D N +RAL rjL++ K 
20 Sbjct: 83 EYNLKKX3SNLIATTPVLIAPVGVYSKKIKNLENLKEGftRVAIENimTIffiSR2a,ELI^^ 142 

Query: 160 LITLKNTTSKKTKAIEDIITNPKKIiRIEPVALU^QAYFEYDLVFNPPGYVTKINLVPKR 219 

LX L + KT DI NPKKL+ + ]:i+A+D+ + LP + 

Sbjct: 143 LIEIiNKNTLKTPL--DINKNPKKLKFIELKaaQLPRAIiDDVDIAIINSNFALGRGLN^ 200 

25 

Query: 220 DRLLYEKKPDIRFAGALV7«EnNKNSDKIKVLKEVLTSKEIRHYITKEIPSEAAVAP 276 

D + EK + + +VR + KNS+K KV+ E+L S + + I + AF 
Sbjct: 201 DTIFREDK-NSPyVWYVWRSEGKNSEKTKVIDEILRSDKPKAIINEHYKDILIPAF 256 

30 SEQ ID 6736 (GBS126) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 34 (lane 7; MW 32kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2180 

35 A DNA sequence (GBSx2297) was identiJSed in S.agalactiae <SEQ ID 6737> which encodes the amino 
acid sequence <SEQ ID 673 8>. This protein is predicted to be probable permease of ABC transporter. 
Analysis of this protein sequence reveals the following: 

Possible site: 34 

>» Seems to have no N-terminal signal sequence 

40 INTEGRAL Likelihood =-11.99 TransmeTnbrane 190 - 206 ( 187 - 215) 

INTEGRAL Likelihood = -8.44 Transmembrane 25 - 41 ( 16 - 45) 

INTEGRAL Likelihood = -6.48 Transmembrane 69 - 85 ( 68 - 90) 

INTEGRAL Likelihood = -3.77 Transmembrane 90 - 106 ( 88 - 109) 

INTEGRAL Likelihood = -1.44 Transmembrane 145 - 161 ( 145 - 161) 

45 



55 



Final Results 

bacterial membrane Certainty=0 . 5798 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database, 

>GP:AAG08889 GB:AE004963 probable permease of ABC transporter 
[Pseudomonas aeruginosa] 
Identities = 80/206 (38%) , Positives = 127/206 (60%) , Gaps = 4/206 (1%) 

Query: 15 SFWETraM^LTLILCFLIAFPTGILLFSLRKSYLIKHSLAYQLLNLFLGTLRSVPFLIF 74 

+FW MLG +L+ ++ P G+LLF + + Y LL+L + LRS+PF+I 

Sbjct: 24 TFW MLGGSLLFTWLGLPLGVLLFLTGPRQMFEQKAVYTLLSLWNILRSLPFIIL 79 
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Query: 75 IFILIPIiNRLIFGTSFGTIAAILPLTLVSVSLYARYVEQALIMIPQVWDRALSLGANKR 134 

+ ++IPL LI GTS G AI Pii + + +AR VE AL + + +++ ++GA+ R 
Sbjct: 80 LIWIPLTVLITGTSLGVAGAIPPLWGATPFFARLVETALREVDKGIIEATQAMGASTR 139 

5 

Query: 135 QIIYYPLIPSIKIDLVLSFTATAlSILGYSTIMGVIGRGGLGEYAYRFGYQEyDYPVMYL 194 

QII+ L+P + ++ + T TAI+++ Y+ + GV+GAGGLG+ A RFGYQ + VM + 
Sbjct: 140 QIIWNALLPEARPGIIAAITVTAITLVSYTAMAGWGAGGLGDIiAIRFGYQRFQTDVMW 199 

10 Query: 195 IWLFIIYVFILQSLGYFIANRYSRK 220 

W+ +1 V ILQ++G + +SRK 
Sbjct: 200 TWMLLILVQILQTVGDKLWHFSRK 225 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 2181 

A DNA sequence (GBSx2298) was identified in S.agalactiae <SEQ ID 6739> which encodes the amino 

acid sequence <SEQ ID 6740>. This protein is predicted to be ABC transporter, ATP-binding protein 

(oppF). Analysis of this protein sequence reveals the following: 

20 Possible site: 48 

»> Seems to have no H-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty^O . 5454 (Affirmative) < suco 

25 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9333> which encodes amino acid sequence <SEQ ID 9334> 
was also identified. 

30 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC22280 GB:U32744 ABC transporter, ATP-binding protein 

[Haemophilus influenzae Rd] 
Identities = 62/174 (35%) , Positives = 104/174 (59%) , Gaps = 2/174 (1%) 

35 Query: 1 MKMINGLIPYDKGNIYYQGKEVKSFSDimiRQMRKDIAYIFQNHNLrAGESVYYHLALVY 60 

++ +N L G++ G E+ SD +Ii R+ I IFQ+ NLL+ +V+ ++AL 
Sbjct: 48 IRCVHLLEKPTSGSVIVDGVELTKLSDRELVIiyRRQICaMIFQHEm,LSSRTVPENVALPL 107 

Query: 61 KrOTQKVN--HDAINDILDFLGLMDLKQVKCHSLSGGQQQKVaiAMAVLQKPKI,ILCDEI 118 
40 . +L + +1 +LD +Gi:i + + +LSGGQ+Q+VAIA A+ PK++LCDE 

Sbjct: 108 ELESESKAKIQEKITALLDLVGLSEKRDAYPSNLSGGQRQRVAIARALASDPKVLLCDEA 167 

Query: 119 SSALDTNSEKEIFNLLSDLREKYGISILMIAHHLSLLRQYCDRVMILDHQTIVD 172 
+SALD + + I LL ++ GI+IL+I H + ++KQ CD+V ++D +V+ 
45 Sbjct: 168 TSALDPATTQSILKLLKEINRTLGITILLITHEMEWKQICDQVAVIDQGRLVE 221 

There is also homology to SEQ ID 76. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

50 Example 2182 

A DNA sequence (GBSx2299) was identified in S.agalactiae <SEQ ID 674 1> which encodes the amino 
acid sequence <SEQ ID 6742>. Analysis of this protein sequence reveals the following: 

Possible site: 21 
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»> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial outside CertaintifcO . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty^O . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

10 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2183 

A DNA sequence (GBSx2300) was identified in S.agalactiae <SEQ ID 6743> which encodes the amino 
acid sequence <SEQ ID 6744>. Analysis of this protein sequence reveals the following: 

15 Possible site: 18 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0904 (Affirmative) < suco 

20 bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certaintyi=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9741> which encodes amino acid sequence <SEQ ID 9742> 
was also identified. 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AaB87515 GB:AF034138 unknown [Bacillus subtilis] 
Identities = 74/125 (59%) , Positives = 92/125 (73%) 

Query: 5 MGIFSGLMGNASQMDTDK^TENQLSDILISDEQVDIAYTLIRDLIVFTNYRLILVDEQGVT 64 

30 MG GL+GNAS + T V+ +L+ IL+ E+V+ A+ L+Rr)LIVFT+ RLILVDKQG+T 

Sbjct: 1 MGFIDGLLGNASTLSTAAVQEELAHILLEGEKVEAAFKLVRDLIVFTDKRLILVDKQGIT 60 

Query: 65 GKKVSYNSIPYASISRFTVETSGHFDLDAELKIWISSAIEPAEVLQFKNDRNIVSIQKAL 124 
GKK + SIPY SISRF+VET+G FDLD+ELKIWIS A PA QFK D +1 IQK L 
35 Sbjct: 61 GKKTEFQSIPyKSISRFSVETAGRFDICSELKIWISGAELPAVSKQFKKDESIYDIQKVIi 120 

Query: 125 ATAVL 129 

A + 
Sbjct: 121 AAVCM 125 

40 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2184 

45 A DNA sequence (GBSx2301) was identified in S.agalactiae <SEQ ID 6745> which encodes the amino 
acid sequence <SEQ ID 6746>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

»> seems to have no H-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty^O . 0921 (Affirmative) < suco 



wo 02/34771 



PCT/GBOl/04789 



-2462- 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0.0000(Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 933 1> which encodes amino acid sequence <SEQ ID 9332> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA74739 GB:Y14370 peptide chain release factor 3 
[Staphylococcus aureus] 
Identities = 274/462 (59%) , Positives = 349/462 (75%) , Gaps = 9/462 (1%) 



Query: 


1 


^roIEKQRGISVTSSVMQFDYAGKRWIIlD^PGHEDFSEDTYRTI^VDAAVMVVDSAKGI 


60 






M +E++RGISVTSSVMQFDY +NILDTPGHEDFSEDTYRTLMAVD+AVMV+D AKG+ 




Sbjct: 


57 


MKVEQERGISOTSSWQPDYDDVEINILDTPGHEDFSEDTYRTLMAVDSAVMVIDCAKGV 


116 


Query: 


61 


EAQTKKLFEVVKHRNIPVFTFINia,DMX3REPIJ3LLEELEEVIX3IASYPMN^ 


120 






E T KLF+V K R IP+FTFINKLDR G+EP +LL+E+EE L I +YPMNWPIGMG+SF 




Sbjct: 


117 


EPPTLKLFKVCKtroGIPIFTFINKLDRTCKEPFELLDEIEETIJIIETYPMNWPIGMGQSF 


176 


Query: 


121 


EGLYDLHNKRLELYKGDERFASIEDG DQLFANNPFYEQVKEDIELLQEAGNDFSE 


175 






G+ D +K +E ++ +E + D D N+ +EQ E++ L++EAG F 




Sb j ct : 


177 


FGIIDRKSKTIEPFRDEENIIiHIiNDDFELEEDHAITNDSDFEQAIEELMLVEEAGEAFDN 


236 


Query: 


176 


QAILMDLTPVFFGSALTOFGVQTFIOTFLEFAPEPHGHKTTEGOTIDPIAKDFSGFVFK 


235 






A+L GDLTPVFFGSAL NFGVQ FL+ +++FAP P+ +T E + P FSGF+FK 




Sbjct: 


237 


DALLSGDLTPVFFGSALANFGVQNFimYVDFAPMPNRRQTKENVEVSPFDDSFSGFIFK 


296 


Query: 


236 


IQaNMDPRHRDRIAFVRIVSGEFERGMGVNLTRTGKGAKLSNVTQFMAES - RENVTNAVA 


294 






IQANMDP+HRDRIAF+R+VSG FER + + L +K S+V + + ++++ V +AVA 




Sbjct: 


297 


IQRNMDPKHRDRIAFMRWSGAFER-VVmLClWLIKSKRSHVQRHLWQTIKKLVNHA^ 


355 


Query: 


295 


GDIIGVYDTGTYQVGDTLTVGKNKFEFEPLPTFTPELFMKVSAKNVMKQKSFHKGIEOLV 


354 






GDIIG+YDTG YQ+GDTL GK + F+ LP PTPE+FMKVSAKNVMKQK FHKBIEQLV 




Sbjct: 


356 


GDIIGLYDTGNYQIGDTLVGGKQTYSFQDLPQFTPEIFMKVSAKNVMKQKHFHKGIEQLV 


415 


Quer^': 


355 


QEGAIQLYKNYQTGEYMLGAVGQLQFEVFKHRMEGEYNAEWMTPMGKKTVRW- - INSDD 


412 






QEGAIQ YK T + +LGAVGQLQFEVF+HRM+ EYN +WM P+G+K RW N D 




Sbjct: 


416 


QEGAIQYYKTLHTNQIILGAVGQI^FEWEHRMKlffiYMfVDVVMEPVGRKIARWDIENEDQ 


475 


Query: 


413 


IDERMSSSRNILAKDRFDQPVFLFENDFALRWPADKYPDVKL 454 








+ ++M++SR+IL KDR+D VFLFEN+FA RWP +K+P++KL 




Sbjct: 


476 


ITDKMNTSRSILVKDRYDDLVFLFENEFATRWPEEKFPEIKL 517 





A related DNA sequence was identified in S.pyogenes <SEQ ID 6747> which encodes the amino acid 
sequence <SEQ ID 6748>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

»> Seems to have no N-teiminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0 .2070 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 447/458 (97%) , Positives = 455/458 (98%) 

Query: 1 MDIEKQRGISVTSSVMQFDYAGKRVNILDTPGHEDFSEDTYRTLMAVDAAVMWDSAKGI 60 

MDIEKQRGISWSSVMQFDYAGKRVNILDTPGHEDFSEDTYRTUyiAVDAAVMVVDSAKGI 
Sbjct: 57 miEKQRGISVTSSVMQFDYAGKRVNILDTPGHEDFSEDTYRTLMAVDAAVMVVDSAKGI 116 

Query: 61 EAQTKKLPEVVKHRNIPVFTPINKLDRDGREPLDLLEELEEVLGIASYPMNWPIGMGKSF 120 

EAQTKKLFEVVKHRNIPVFTFINKI1DRDGREPL+LLEELEEVLGIASYPMNWPIGMG++F 
Sbjct: 117 EAQTKKIiFEVVKHRNIPVFTFINKLDRIXSREPLELIiEELEEVLGIASYPiyiNWPIGMGRAF 176 
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Query: 121 EGLYDLHNKRLELYKGDERFASIEDGDQLFAKNPFYEQVKEDIELLQEAGNDFSEQAILD 180 

EGLYDLHNKRLELYKGDERFASIEDGDQLFANNPFYEQVKEDIELLQEAGNDFSEQAILD 
Sbjcb: 177 EX3LYDLHNK3?LELYKGDERFASIEDGDQLFflNNPFyEQVKEDIELLQEAGNDFSEOaiLD 236 

5 

Query: 181 GDLTPVPFGSALTNFGVQTFLDTFLEPAPEPHGHKTTEGNVIDPLaKDFSGFVFKIQaNM 240 

GDLTPVFPGSaLTNFGVQTFUJTFLEFAPEPHGHKTTEGNV+DPUiKDFSGFVPKIQftK^ 
Sbjct: 237 GDLTPVFFGSaLTNFGVQTFLDTFLEFAPEPHGHKTTEGNVVDPLAKDFSGFVFKIQRlSM 296 

10 Query: 241 DPRHRDRIAFWIVSGEFERGMGVlSn^TRTGKGAKliSWrQFMAESRENVTNAVAGDIIGV 300 

DP+HRDRIAFVRIVSGEFERGMGVNLTRTGKGAKLSNVTQFMAESRENVTNAVAGDIIGV 
Sbjct: 297 DPKHRDRIAFVRIVSGEFERGMGVNLTRTGKGAKLSim'QFMMSRENVTmVAGDIIGV 356 

Query: 301 YDT6TYQVGDTLTVGKNKFEFEPLPTFTPELFMKVSAKNVMKQKSPHRGIEQLVQEGAIQ 360 
15 YDTGTYQVGDTLTVGKNKFEFEPLPTFTPE+FMKVS KNVMKQKSFHKGIEQLVQEGAIQ 

Sbjct: 357 YDTGTYQVGDTLTVGKNKFEFEPLPTFTPEIEMKVSPKNVMKQKSFHKGIEQLVQEGAIQ 416 

Query: 361 LYKNYQTGEYMLGAVGQLQFEVFKHRMEGEYNAEWMTPMGKKTVRWINSDDLDERMSSS 420 
LYKNYQTGEYMLGAVGQLQFEWKHRMEGEYNAEVVMTPMGKaCi™WI+ DDLD+RMSSS 
20 Sbjct: 417 rIYKNYQTGEY^«^GAVGQLQFEWKHRMEGEY]Sffi^OTMPMGKK^ 476 

Query: 421 RNILAKDRFDQPVFLFENDFALRWFADKyPDVKLEEKM 458 

RNILAKDRFDQPVFLFENDFALRWFADKYPDV LEEKM 
Sbjct: 477 RNILaKDRFDQPVFLFENDFALRWEADKyPDVTIiEEKM 514 

25 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2185 

A DNA sequence (GBSx2302) was identified in S.agalactiae <SEQ ID 6749> which encodes the amino 
30 acid sequence <SEQ ID 6750>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

»> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0. 3061 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

40 >GP:AAC38046 GB:AF000954 No definition line found [Streptococcus tnutans] 

Identities = 122/142 (85%) , Positives = 138/142 (96%) 

Query: 1 NmEFAAQKTGKSMKEMAVTFVTNERSHEIiNLEYRDTDRPTDVISIiEYKPEVDISFDEEDL 60 
+LEFAaQKroKE+KEMAOTPVTNERSHELNL+YRDT+RPTDVISLEYKPE +SFDEEDL 
45 Sbjct: 23 ILEFAAQKTGKEDKEMAVTFVTtffiRSHEIJSn^KYRDTNRP^ 82 

Query: 61 AENPELAEMLEDFDSYIGELFISIDKAKEQaEEYGHSYEREMGFLAVHGFLHINGYDHYT 120 

A++P+I1AE+L +FD+YIGELFlS+DKA+EQA+EYGHS+EREMGFLAVHGFIiHINGYDHYT 
Sbjct: 83 ADDPDLftEVLTEPDaYIGELFISVDKaREQAQEYGHSFEREMGPLAVHGFLHINGYDHYT 142 

50 

Query: 121 PEEEKEMFSLQEEILTAYGLKR 142 

P+EEKEMFSLQEEIL AYGLKR 
Sbjct: 143 PQEEKEMFSLQEEILDAYGLKR 164 

55 There is also homology to SEQ ID 120. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 2186 

A DNA sequence (GBSx2303) was identified in S.agalactiae <SEQ ID 6751> which encodes the amino 
acid sequence <SEQ ID 6752>. Analysis of this protein sequence reveals the following: 

Possible site: 59 
5 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-15.39 Transmembrane 108 - 124 ( 100 - 131) 
INTEGRAL Likelihood = -8.92 Transmembrane 61 - 77 ( 52 - 82) 
INTEGRAL Likelihood = -5.36 Transmembrane 41 - 57 ( 40 - 60) 

10 Final Results 

bacterial membrane Certainty=0. 7156 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC38047 GB:AF000954 diacyglycerol kinase [Streptococcus mutans] 
Identities = 107/133 (80%) , Positives = 121/133 (90%) , Gaps = 2/133 (1%) 

Query: 1 MDLNra--NHKKWKlSIRTLTSS^ffiFAVTGIFTAFKEERMMRKHLVSAILVIIlAGLTFQVSM 58 
20 MDL DN + KKWKNRTLTSS+EFA+TGIFTAFKEERNM+KH VSA+L ++AGL F+VS+ 

Sbjct: 3 MDLRDNKQSQKKWKNRTLTSSIBFALTGIFTAFKEERNMKKHAVSALLAVIAGLVFKVSV 62 

Query: 59 VEWLFLLLSIFLVITFEIINSAIENWDLASNYHFSMLAKNAKDMAAGAVLWSLFAVLV 118 
+EWLFLLLSIFLVITFEI+NSAIENWDLAS+yHFSiyiLAKNAKDMaAGAVLV+S FA L 
25 Sbjct: 63 lEWLFLLLSIFLVITFEIVNSAIENWDLASDYHFSMLRKNAKDMAAGRVLVISSFAALT 122 

Query: 119 GLIIFIPKILALL 131 

GLIIF+PKI LL 
Sbjct: 123 GLIIFVPKIWFLL 135 

30 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6753> which encodes the amino acid 
sequence <SEQ ID 6754>. Analysis of Hiis protein sequence reveals the following: 

Possible site: 34 
»> Seems to have no N-terrainal signal sequence • 
35 INTEGRAL Likelihood =-10.67 Transmembrane 63 - 79 ( 41 - 84) 

INTEGRAL Likelihood = -7.32 Transmembrane 110 - 126 ( 105 - 129) 
INTEGRAL Likelihood = -S.41 Transmembrane 43 - 59 ( 41 - 62) 

Final Results 

40 bacterial membrane Certainty=0. 5267 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty4=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

45 >GP:AAC38047 GB:AF000954 diacyglycerol kinase [Streptococcus mutana] 

Identities = 104/135 (77%) , Positives = 119/135 (88%) 

Query: 1 MALHDNNTTKRKWKNRTITSSLEFALTGVFTAFKEERNLRSHLLSACLACVAflLFFSISA 60 
M L DN +++KWKNRT+TSSLEFALTG+FTAFJCEERN++ H +SA LA +AGL F +S 
50 Sbjct: 3 MDLRDNKQSQKKWKMRTLTSSLEFALTQIFTAFKEERNMKKHAVSALLAVIAGLVFKVSV 62 

Query: 61 lEWLFLLLAIFLVITLEIVNSAIENVVDLASDYHFSMLAKNAKDMAAGAVLMISGYAVLT 120 

lEWLFLLL+IFLVIT EIVNSAIENWDLASDYHFSMLAKNAKDMAAGAVL+ISG+A LT 
Sbjct: 63 lEWLFLLLSIFLVITFEIVNSAIENVVDLASDYHFSMLAKNAKDMAAGAVLVISGFAALT 122 



55 



Query: 121 GLIIFIPKIWNIFVH 135 

GLIIF+PKIW + H 
Sbjct: 123 GLIIFVPKIWFLLFH 137 



60 An alignment of the GAS and GBS proteins is shown below. 
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Identities = 98/129 (75%) , Positives = 115/129 (88%) , Gaps = 2/129 (1%) 

Query: 1 MDLNDHN--HIOCWramTLTSSMEFAVTGIFTAFKEERNMRKHLVSAILVILAGLTFQVSM 58 

M L+DNN +KWKNRT+TSS+EFA+TG+FTAFKEERN+R HL+SA L +AGL F +S 
Sbjct: 1 MALHDNim'KRKmNRTITSSriEFALTGWTAFKEERNLRSHLLSACI^ 60 

Query: 59 VEWLPLLLSIPLVITFEIINSAIEimDIJ^iraiFSMLAKNAK^ 118 

+EWIiPI.LL+IPLVIT EI+JSSAIENVVDIiAS+raPSMDSJQiaKEJMaAG^ +AVL 
Sbjct: 61 lEWLPLLLAIPIiVITLEIVNSMENVVDLASDyHFSMIAKNAKDMRAGAVLMISGYAVL^ 120 

Query: 119 GLIIFIPKI 127 

GLIIFIPKI 
Sbjct: 121 GLIIFIPKI 129 

15 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2187 

A DNA sequence (GBSx2304) was identified in S.agalactiae <SEQ ID 6755> which encodes the amino 
acid sequence <SEQ ID 6756>. This protein is predicted to be GTPase Era (era). Analysis of this protein 
20 sequence reveals the following: 

Possible site: 54 

>» Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm — Certainty=0 . 1871 (Affirmative) < suco 

bacterial membrane — Certainty4=o . 0000 (Not Clear) < suco 

bacterial outside — CertaintyssO . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10017> which encodes amino acid sequence <SEQ ID 
30 1 00 1 8> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD41632 GB:AF072811 GTPase Era [Streptococcus pneumoniae] 
Identities = 273/299 (91%) , Positives = 290/299 (96%) 

35 Query: 16 MTFKSGFVAILGRPNVGKSTFLNHVMGQKIAIMSDKAQTTRNKIMGIYTTETEQIVFIDT 75 

MTFKSGFVAILGRPNVGKSTFLNHVMGQKIAIMSDKAQTTKNKIMGIYTT+ EQXVFIDT 
Sbjct: 1 MTFKSGFVAILGRPNVGKSTFLNHVMGQKIAIMSDKAQTTRNKIMGIYTTDKEQIVPIDT 60 

Query: 76 PGIHKPKTALGDFMVESAYSTLREVETVLFMSTPMEKRGKGDDMIIERLKaAKIE^ 135 
40 PGIHKPRTALGDPMVESAYSTLREV+TVLPMVPADE RGKGDDMIIERLKAAK+PVIIjV+ 

Sbjct: 61 PGIHKPKTALGDFMVESAYSTLREVDTVLPMVPADEARGKBDDMIIERLKAAKVPVILVV 120 

Query: 136 NKIDKVHPDQLLEQIDDFRSQMDFKEWPISALQGNNVPTLIKLLTDNLEEGFQYFPEDQ 195 
NKIDKVHPDQLL QIDDFR+QMDFKE+VPISALQGNNV L+ +L++NL+EGFQYFP DQ 
45 Sbjct: 121 NKIDKVHPDQLLSQIDDPRNQMDPKEIVPISALQGNNVSRLVDILSENLDEGPQYFPSDQ 180 

Query: 196 ITDHPERFLVSENWTJEKOTiHLTQQEVPHSVRWVESMKRDEETDKVHIRATIMVERDSQK 255 

ITDHPERFLVSErTVREKVLHLT++E+PHSVAVVV+SMKRDEETDKVHIRATIMVERDSQK 
Sbjct: 181 ITDHPERFLVSEMVREKVLHLTREEIPHSVAWVDSMKRDEETDKVHIRATIMVERDSQK 240 



50 



Query: 256 GIIIGKQGAMLKKIGKMARRDIEIMKSDKVYLETWVKVKKISIWRDKK]:^ 314 

GIIIGK GRMLKKIG MARRDIEXJ^LGDKV+LETWVKVKKNWRDKKtDLADFGYNE+EY 
Sbjct: 241 GIIIGKGGaMLKKIGSMARRDIEIJlLGDKVFLETWVKVICKNWRDKKIDIJ^ 299 



55 A related DNA sequence was identified in S.pyogenes <SEQ ID 6757> which encodes the amino acid 
sequence <SEQ ID 6758>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

»> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 . 1088 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 295/297 (99%) , Positives = 296/297 (99%) 

10 Query: 18 FKSGWAILGRENVGKSTFIJSHVMGQKIAIMSDKaQTTmJKIMGIOT^ 77 

FKSGPVAILGRE^M3KSTFIlNH™GQKIAIMSDKAQTTRlIKIMGIYTT 
Sbjct: 2 FKSGFVAIIXSRPNVGKSTFLtraVMGQKIAIMSDKAQTTFNKIMGIYTTO 61 

Query: 78 IHKPKTALGDFMVESAYSTLREVETVLFMVPADEKRGKGDDMIIERLKAAKIPVILVINK 137 
15 IHKPKTALGDFMVESAYSTLREVETVLFMVPADEKRGKGDDMIIERLKAAKIPVILVXNK 

Sbjct: 62 IHKPKTALGDFMVESAYSTLREVETVLFMVPADEKRGKGDDMIIEaiLKAAKIPVILVINK 121 

Query: 138 IDK\mPDQ]:iLEQIDDFRSQMDFKEVVPISRI<3GNNVT>TLIKi:ir.T^ 197 
IDKVHPDQLLEQIDDF SQMDFKEWPISAL+GNNVPTLIKLLTDNLEEGFQYFPEDQIT 
20 Sbjct: 122 IDKVHPDQLLEQIDDFHSQmFKEVVPISALEGNNVPTLIKLLTDlTCiEEGFQYFPEDQI 181 

Query: 198 DHPERFLVSEMVREKVLHLTQQEVPHSVAWVESMKRDEETDKVHIRATIMVERDSQKGI 257 

DHPERFLVSE^TO^EKVLHLTQQEVPHSVAWVESMKRDEETDKVHIRATIMVERDSQKGI 
Sbjct: 182 DHPERFLVSE^n^ffiK^7LHLTQQEVPHSVaVVVESMKRDEETDKVHIRATIMVERDSQKGI 241 

25 

Query: 258 IIGKQGRMLKKIGKMARRDIEUII/SDKVYLETWVKVKKNWRDKKIiDrM 314 

IIGKQGAMLKKIGKMftRRDIELMLGDKVYLETWVKVKKNWRDKK^ 
Sbjct: 242 IIGKQGMDKKIGKMARRDIErMLGDKVYLETWVKVKK^™a^KKm 298 

30 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2188 

A DNA sequence (GBSx2305) was identified in S.agalactiae <SEQ ID 6759> which encodes the amino 
acid sequence <SEQ ID 6760>. Analysis of this protein sequence reveals the following: 

35 Possible site: 27 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty^O. 2679 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

45 Based on this analysis, it was predicted that this protein and its epitopes, could be useftil antigens for 
vaccines or diagnostics. 

Example 2189 

A DNA sequence (GBSx2306) was identified in S.agalactiae <SEQ ID 676 1> which encodes the amino 
acid sequence <SEQ ID 6762>. Analysis of this protein sequence reveals the following: 

50 Possible site: 21 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 
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bacterial metnbrane — Certainty=0.0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the foUowiag sequences in the GENPEPT database. 

5 >GP:BAai6793 GB:D90900 hypothetical protein [Synechocystis sp.] 

Identities = 36/119 (30%) , Positives = 57/119 (47%) , Gaps = 15/119 (12%) 

Query: 390 TSDYEKAKVIHDHLVNNYTYATEEIATTRETASGISIHAPEALYKDKRGVCQAFAVMFKD 449 
++D+E+A++ + + N Y +A TR I PE + +C ++ +++ 

10 Sbjct: 153 SNDWEEARLAYSWITQNIAYDVP-MAETRN IDDLRPETVLftRGETICSGYSNLYQA 207 

Query: 450 MAATAGLSVWYVTGQAGGG NimWNIVTINGVKYYVDTTWDNNIKSNKYF 498 

+A GL V + G A GG NHAWN V I+G Y +DTTW I S+ F 

Sbjct: 208 lAKELGLDWIIEGFAKGGDVIVGDDPDVNHAWNGTOIDGQWYLLDTTWGAGIVSDGKF 266 

15 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6763> which encodes the amino acid 
sequence <SEQ DD 6764>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

20 »> May be a lipoprotein 

Final Results 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certaxnty=0 . 0000 (Not Clear) < suco 

25 bacterial cytoplasm Certaintyi=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 



30 



Identities = 41/181 (22%) , Positives = 79/181 (42%) , Gaps = 17/181 (9%) 

Query: 355 ITITYTLKGDMVGLHKEYKQFVDSFVKENITNKNITSDYEKAKVIHDHLVNNYTYATE-- 412 

+ +T+ + D ++++ Q + + + N +K+ YE+ K ++ ++ + Y + 
Sbjct: 124 VF\7TFPIPEDAKNIYQDL-QAI(aiDIVANTPSKD---RYEQVKYFYEVIIRDTDYNKKAF 179 

35 Query: 413 EIATTRETASGISlHAPEaLyKDKRGVCQaFAVMFKDMAATAGLSVWyVTGQaGGGN- - - 469 

E + A S ++++ D VC +A F+ + AG+ V Y+ G 
Sbjct: 180 EAYQSGSQAQVASNQDIKSVFIDHLSVCNGYAQAFQFLCQKAGIPWAYIRGTGTSQQPQQ 239 

Query: 470 HAWNIVTINGVKYYVDTTW DNNIKSNKYFLVGKTIMDADHLLDSQYNALAKDI 522 

40 HAWN V IN Y VD TW DN++ K + + + L + + +KDI 

Sbjct: 240 SFAHAWNAVQINNTYYGVDVTWGDPVFDNHLSHQKQGTINYSFLCLPDYLMALSHQPSKDI 300 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

45 Example 2190 

A DNA sequence (GBSx2307) was identified in S.agalactiae <SEQ ID 6765> which encodes the amino 
acid sequence <SEQ ID 6766>. This protein is predicted to be rgg protein. Analysis of this protein sequence 
reveals the following: 

Possible site:- 29 
50 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.16 Transmembrane 187 - 203 ( 187 - 203) 

Pinal Results 

bacterial membrane Certainty=0 . 1065 (Affirmative) < suco 

55 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 10015> which encodes amino acid sequence <SEQ ID 
1 00 1 6> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA26968 GB:M89776 rgg [Streptococcus gordonii] 
Identities = 71/273 (26%) , Positives = 140/273 (51%) , Gaps = 16/273 (5%) 



Query: 


8 


KELGKTLRRLRKGKKVSISSLADEHLSKSQISRFERGESElTCSRLIiNILDKIiNITIDEF 


67 






K GK L+ +R+ K +S+ +A +S +Q+SR+BRG S +T + L +++++ EF 




Sbjct: 


5 


KSSGKILKIIRESKNMSLKEVAAGDISVAQLSRYERGISSLTVDSFYSCLRl^SVSLAEF 


64 


Query: 


68 


VSI -HSKAHTHFFILIiNRVRKYCSffiKNVTKLVALL EDHNHKDYEKIMIK 


115 






+ H+ +L ++ + E N+ KL ++L E N+K I+I+ 




Sbjct: 


65 


QYVinnraiEaDDVVLSQKLSEAQREmiVKl^SII^SEaMAQEFPEKK^ 


123 


Query: 


116 


ALIFSIDQSIEPNQEELARLTDYLFTVEQWGYYEIIIiLGNCSRLINYNTLFLLTKEMVNS 


175 






A + S + + ++ ++ LTDYLF+VE+WG YE+ L N L+ TL EM+N 




Sbjct: 


124 


ATLTSaSPDYQVSRGDIEFLTDTLPSVEEWGRYELWLFTNSVNLLTLETLETFASEMINR 


183 


Query: 


176 


FAYSEQimWKILOTQLAINCLIISIDHSYPEHSHYLIDKVRSLlXPEVNFYEKIOTLYV 


235 






+ N+ + ++ +N + + + ++ + + E + Y++ + Y 




Sbjct: 


184 


TQFYNNLPENRRRIIKmiiNWSACIENirariQVAMKFLNYIDM' 


243 


Query: 


236 


TGYYHLKLGDTSSGKEDMRKALQIFKYLGEDSF 268 








Y K+G+ •+ + D+ + L F+YL DSF 




Sbjct: 


244 


KALYSYKVGNPHA-RHDIEQCLSTFEYL--DSF 273 





There is also homology to SEQ ID 628. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2191 

A DNA sequence (GBSx2308) was identified in S.agalactiae <SEQ ID 6767> which encodes the amino 
acid sequence <SEQ ID 6768>. Analysis of this protein sequence reveals the following: 
Possible site: 36 

>» Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm — Certainty=0. 3234 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certaintyi=0. GOOD (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BaA05'066 GB:D26071 formamidopyrimidine-DNA glycosylase 
[Streptococcus mutans] 
Identities = 182/271 (67%) , Positives = 217/271 (79%) 



Query: 


1 


MPELPEVETVRKGLERLVXnjQEIASITIKOTKMVKTDLNDFMISLPGKTIQQVLRRGKYL 


60 






MPELPEVETVR+GLE L+V ++I S+ ++VPKMVKT + DF + + G+T + + RRGKYL 




Sb j ct : 


1 


MPELPEVETVRRGLEHLIVGKKIVSVEVRVPKMVKTGVEDFQLDILGQTFESIGRRGKYL 


60 


Quejry: 


61 


IiFDFGEMVIWSHLRMEGKYLLFENKVPDNKHFHr.YFKLTNGSTLVYQDVRKFGTFELVRK 


120 






L + ++SHI,RMEGKYLLF ++VPDNKHFHL+F L GSTLVyQDVRKFGTFEL+ K 




Sbjct: 


61 


LIMIlIRQTIISHIJaffiGKYLIiFEDEVPDNKHFHLFFGIJDGGSTLVYQDVRKFGTFELL^^ 


120 


Query: 


121 SSLKDYFTQKKLGPEPTADTFQFEPFSKGIjaiSKKPIKPLIJ^QRLVAGLGNiyVDEVLW 


180 






S ++ YF QKK+GPEP A F+ +PF +GIiA S K IK LIiLDQ LVAGLGNIYVDEVLW 




Sbjct: 


121 SQVEAYFVQKKIGPEPNAKDFKLKPFEEGLAKSHKVIKTLLIJ3QHLVAGLGNIYVDEVLW 


180 


Query: 


181 AAKIHPQRLftNQLTESETSLLHKEIIRILTLGIEKGGSTIRTYIOaALGEDGTMQKYLQVY 


240 
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AAK+ P+RIA+QL SE +H E IRIL L lEKGGSTIR+YKN+LGEDG+MQ LQVY 
Sbjct: 181 AAKVpPERLASQLKTSEIKRIHDETIRILQLAIEKCSGSTIRSYKNSLGEDGSMQDCLiQVY 240 

Query: 241 GKTGQPCPRCGCIiIKKIKVGGRGTHYCPRCQ 271 
5 GKT QPC RC I+KIKVGGRGTH+CP CQ 

Sbjct: 241 GRrDQPCftRCATPIEKIKVGGRGTHFCPSCQ 271 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6769> which encodes the amino acid 
sequence <SEQ ID 6n0>. Analysis of this protein sequence reveals the following: 

10 Possible site: 54 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2068 (Affirmative) < suco 

15 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. COCO (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 190/271 (70%) , Positives = 229/271 (84%) 

20 

Query: 1 MPELPEVETVRKGLERLVVIvIQEIASITIKVPKMVKTDLNDFMISLPGKTIQQVLRRGKYL 60 

MPELPEVETVR+GLE LV+ QEI ++T+KVPKMVKTDL F ++LPG+ IQ V RRGKYL 
Sbjct: 1 MPELPE\7ETVRRGLETIjVI<3QEIVftVTLK\re'KMVKTDLETFALTLPGQIIQSVGRRGK^^ 60 

25 Query: 61 LFDFGEMVMVSHLRMEGKYI^FENKVPDNKHFHLYFKLTNGSTLVYQDWK^ 120 

L D G++V+VSHI1RMEGKYLLFP++VPDNKHFH++F+L N6STLVYQDVRKFGTF+L+ K 
Sbjct: 61 LIDIXMLVLVSHIiRMEGKYLLFPDEVPDNKHFHVFFELKNGSTLVYQDVRKFGTPDLIAK 120 

Query: 121 SSLKDYFTQKKLGPEPTADTFQFEPFSKGLANSKKPIKPLLLDQRLVAGLGNIYVDEVLW 180 
30 S I. +F ++KLGPEP +TF+ + F L +S+KPIKP LLDQ LVAGLGNIYVDEVLW 

Sbjct: 121 SQLS2^AKRia:X3PEPKKETFKIiKTFEAaLLSSQKPlKPHLLDQTLV3iGI^IYVDEVLW 180 

Query: 181 AAKIHPQRLANQLTESETSIJjHKEIlRILTIiGIEKGGSTIRTYKNaLGEDGTMQKYLQVY 240 
AAK+HP+ +++L ++E LH E IRIL LGIEKGGST+RTY+NALG DGTMQ YLQVY 
35 Sbjct: 181 AAKVHPETASSRENKaEIKRLHDETIRIIJU^IEKiGGSTVRTYRimiGftriG™ 240 

Query: 241 GKTGQPCPRCGCLIKKIKVGGRGTHYCPRCQ 271 

G+TG+PCPRCG I K+KVGGRGTH CP+CQ 
Sbjct: 241 GQTGKPCPRCQQAIVKLKVGGRGTHICPKCQ 271 

40 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2192 

A DNA sequence (GBSx2309) was identified in S.agalactiae <SEQ ID 6771> which encodes the amino 
45 acid sequence <SEQ ID 6772>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have no N-terminal signal sequence 

Pinal Results 

50 bacterial cytoplasm Certainty=0 . 0797 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside CertaintYi=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10013> which encodes amino acid sequence <SEQ ID 
55 1 0014> was also identified. 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC00353 GB:AP008220 YtaG [Bacillus subtilis] 
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Identities = 80/189 (42%) , Positives = 113/189 (59%) , Gaps = 1/189 (0%) 

Query: 8 MTKIIGLTGGIASGKSTVTKIIRESGFKVIDADQVVHKLQAKGGKLYQALLEWLGPEILD 67 

MT +IGLTGGIASGKSTV ++ E G VIDAD + + KG Y+ +++ G +IL 
Sbjct: 1 mLVIGLTGGIASGKSTViajMLIEKGITVIDRDIIAKQAVEKGMPAYRQIIDEFGEDILL 60 

Query: 68 ADGELDRPKLSQMIFANPDNMKTSflRLQNSIIRQELACQRDQLKQTEEIF-FMDIPLLIE 126 

++G++DR KL ++F N + + +RQE+ +RD+ E F +DIPLL E 

Sbjct: 61 SNGDIDRKKXXSALWTNEQKRLlUmiWPAWQEraJiqR^ 120 

Query: 127 EKYIKWFDEIWLVFVDKEKQLQRimRNlSIYSREEaELRLSHQMPLTDKKSPASLIinNNG 186 

K D+I +V V KE QL+RLM RN + EEA R+ QMPL +K + A +IEaSI+G 

Sbjct: 121 SKLESLTOKIIWSVrraiLQLERLMKRNQLTEEEAVSRIRSQMPLEERrARftDQVIDNSG 180 

Query: 187 DLITLKEQI 195 

L K Q+ 
Sbjct: 181 TLEETKRQL 189 

A related sequence was also identified in GAS <SEQ ID 9111> which encodes the amino acid sequence 
<SEQ ID 91 12>. Analysis of this protein sequence reveals the following: 

Possible cleavage site: 59 
»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty= 0 . 101 (Affirmative) < suc9> 

bacterial membrane — Certainty= 0.000 (Not Clear) < suco 

bacterial outside Certaiiity= 0,000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 118/191 (61%) , Positives = 153/191 (79%) 

Query: 9 TKIIGLTGGIASGKSTVTKIIRESGPKVIDADQVVHKLQAKBGKLYQftLLEWIBPEIim 68 

T IIG+TGGIASGKSTV K+IR++G++VIDADQVVH LQ KGG+LY+AL E G +IL A 
Sbjct: 9 TMIIGITGGIASGKSTWKVIRKRGYQVIDADQWHDLQEKGGRLYEZUIjREAFGNQIIiKA 68 

Query: 69 DGELDRPKLSQMIFANPDNMKTSftRLQNSIIRQELACQRDQLKQTEEIFFMDIPLLIEEK 128 

DGELDR KLS+M+F+NPDNM TS+ +QN II++ELA +RD L Q++ IFFMDIPLL+E 
Sbjct: 69 IXSELDRTKMEMLFSNPDNMaTSSAlQNQIIKEELaAKRDHIiaQSQaiFFMDIPIiLMEIiG 128 

Query: 129 YIKWFDEIWnJVFVDKEKQI^RLMARNNYSREEAELRLSHQMPLTDKKSFASLIIDNNGDL 188 

Y WFD IWLV+VD + QLQRLMARN + +A R++ Q+P+ +KK +ASL+IDN+GD+ 
Sbjct: 129 YQDWFnAIWLVYVDAQTQLQRIMaRNRLDKGKftRQRIASQLPIEEKKPYiiSLVIiajSGDI 188 

Query: 189 ITLKEQILDAL 199 

L +Q+ AL 
Sbjct: IBS AALIKQVQSAL 199 

A related GBS gene <SEQ ID 8993> and protein <SEQ ID 8994> were also identified. Analysis of this 
protein sequence reveals a signal peptide at residues 1-16. 

The protein has homology with the following sequences in the databases: 

42.2/60.6% over 189aa 

OMNI|NT01BS3382| Insert characterized 

ORF02237(319 - 885 of 1206) 
OMNI|NT01BS3382 (3 - 192 of 200) () 
%Match =17.0 

%Identity =42.1 %Similarity =60.5 

Matches = 80 Mismatches = 74 Conservative Sub.s = 35 



78 108 138 168 198 228 258 288 

KNSPTAFG*SIDRl*NKLITQGNYSHFNFRHRKRVn.HD*NI*ECSmGRYDAKVFTGLW*NmTVSKVWLFN*EDKSRRE 
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318 348 378 408 438 468 498 528 

RDALIiPSVSMLMTKI IGLTGGIASGKSTVTKI IRESGFKVIDADQVVHKLQAKCkSiaiYQALLEWLGPEIIiiaDGEIJDR^ 

|:| I I IJIII : : II h :| :|| ::i::|| I 

5 VDLLTLVIGLTGGIRSGKSTVaNICiIEKGIWIDMJIIAKQAVEKGMPAYRQIIDEFG^ 

10 20 30 40 50 60 70 

558 588 618 648 675 705 735 765 

LSQMIFMTPDirora'SflRLQNSIIRQEliACQRDQLKQTEEIFF-MDIPLLIEEKYIKWFDEIWLVETOKEKQL^ 
10 1 =:| I : : :|||= : | | : | | :||||| | | |:| :| | || ||:||| || 

LGALVFTlffiQKRLMmiVHPAWQEMIjNRRDEAVaNREAFVV^ 

90 100 110 120 130 140 150 

795 825 855 885 915 945 975 1005 

15 YSREEAELRLSHQMPLTDKKSFASLIIDNNGDLITLKEQILDALQRL*NY*^©OTFIHFLSLLH*F*KTa^*TTVIVQ*Y 

: III h III! =1 : I :|l|:| I I h = : 
LTEEEAVSRIRSQMPLEEKTARADQVIDNSGTLEETKRQLDEIMNSWA 
170 180 190 200 

20 A related DNA sequence was identified in S.pyogenes <SEQ ID 6773> which encodes amino acid sequence 
<SEQ ID 6774>. An alignment of the GAS and GBS sequences follows: 

Score = 218 bits (550) , Expect = 4e-59 

Identities = 104/175 (59%) , Positives = 138/175 (78%) 

25 Query: 25 WKVIRKAGYQVIDADQWHDLQEKGGRLYEALREAFGNQILKADGELDRTKLSEMLFSN 84 

V K+IR++G++VIDADQWH LQ KGG+LY+AL E 6 +IL ADGELDR KLS+M+F+N 
Sbjct: 20 OTKIIRESGFKVIDADQVVHKLQAKGGKLYQALLEWICPEILDaDGELDRPKLSQMIFAN 79 

Query: 85 PDlMATSSAIQNQIIKEELAAKRDHLAQSQAIFF^rolPIlIM:LGYQDWFDAIWLVYVDAQ 144 

30 PDNM TS+ +QN II++ELA +RD L Q++ IFFMDIPLL+E Y WFD IWLV+VD + 

Sbjct: 80 PDNMKTSARLQNSIIRQELACQRDQLKQTEEIFFMDIPLLIEEKYIKWFDEIWLVFVDKE 139 

Query: 145 TQLQRI«ARISlRIiDKGKRRQRIASQLPIEEKKPYASLVIDNSGDIAALIKQVQSAL 199 
QLQRLMRRN + +A R++ Q+P+ +KK +ASL+IDN+GD+ L +Q+ AL 
35 Sbjct: 140 RQMRLMARNlIYSREEAELRLSHQMPLTDKKSFASLlIDNNGDLITLKEQILnAL 194 ' 

SEQ ID 8994 (GBS245) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 61 (lane 6; MW 23.7kDa). It was also ejqpressed in E.coli as a GST-fusion 
product, and purified GBS245-GST is shown in Figure 211, lane 6. 

40 The purified GST fusion product was used to immunise mice ands the resulting antiserum was used for 
FACS (Figure 278). This confirmed that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted fliat these proteins and their epitopes could be usefiil antigens for 
vaccines or diagnostics. 

Example 2193 

45 A DNA sequence (GBSx2310) was identified in S.agalactiae <SEQ ID 6775> which encodes the amino 
acid sequence <SEQ ID 6776>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>>> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 4073 (Affirmative) < suco 

bacterial meiribrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty!=0.0000(Not Clear) < suco 



55 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA30330 GB:AP000005 253aa long hypothetical ATP-binding 
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transport protein [Pyrococcus horikoshii] 
Identities = 78/240 (32%) , Positives = 130/240 (53%) , Gaps = 13/240 (5%) 

Query: 3 LVIRDIRKRFQETEVLRGASYRFYSGKITGVLGRNGAGKTTLFNILYGDLAADNGTICLL 62 
5 +++ ++RK+F EVL+G ++ G+I G+LG NG+GK+T IL G + G + + 

Sbjct: 2 IIVEISrLRKKFGSKEVLKGINFTVNDGEIYGLLGPNGSGKSTTMRILSGIITDFEGKVMVA 61 

Query: 63 -KDNHEYPLTDKD1-GIVYSENYLPEFLTGYEFVKFYMDLH--PSDDL-MTIDDYLDFME 117 
D P+ K+I GV LELTEFF + PDL + +D 

10 Sbjct: 62 GVDVSRDPMKVKEIVGYVPETPALYESLTPAEFFSFIGGVRRIPQDILEERVKRLVDAFG 121 

Query: 118 IGQTERHRIIKBYSDGMKSKLSLICLMISKPKVILLDEPLTAVDWSSIAIKRLLLELSE 177 

IG+ +++I S G K K+SLI ++ P+V++LDE + +D S+ + LL E E 
Sbjct: 122 IGK-YMNQLIGTLSFGTRQKISLISALLHDPQVLILDEAMNGLDPKSARIFRELLFEFKE 180 

15 

Query: 178 D-HIIILSTHIMALAEDLCDIVAVLDKGKL---QTLDIDR---KHEQFEERLLQVLKGDE 230 

+ 1+ STHI+ALRE +CD + ++ +G++ T+D R + E+ E+ L++ + E 
Sbjct: 181 EGKSIWSTHILALAEVMCDRIGIIYEGRIVAEGTIDELREIAREEKLEDIFLKLTQAKE 240 

20 There is also homology to SEQ ID 2876. 

. Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2194 

A DNA sequence (GBSx2311) was identified in S.agalactiae <SEQ ID 6777> which encodes the amino 
25 acid sequence <SEQ ID 611%>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

30' bacterial cytoplasm Certaintyi=0. 6138 (Affirmative) < suco 

bacterial membrane Certainty= 0.0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

35 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2195 

A DNA sequence (GBSx2312) was identified in S.agalactiae <SEQ ID 6779> which encodes the amino 
40 acid sequence <SEQ ID 6780>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

»> Seems to have no N-terminal signal sequence 

45 



50 



INTEGRAL 


Likelihood 




•15, 


.34 


Transmembrane 


526 


- 542 


( 511 - 


546) 


INTEGRAL 


Likelihood 




-9, 


.61 


Transmembrane 


340 


- 356 


( 335 - 


359) 


INTEGR2iL 


Likelihood 




-8, 


.17 


Transmembrane 


455 


- 471 


( 451 - 


476) 


INTEGRAL 


Likelihood 




-8, 


.01 


Transmembrane 


97 


- 113 


( 95 - 


121) 


INTEGRAL 


Likelihood 




-8, 


.01 


Transmembrane 


216 


- 232 


( 207 - 


236) 


INTEGRAL 


Likelihood 




-3 


.40 


Transmembrane 


50 


- 66 


( 46 - 


67) 


INTEGRAL 


Likelihood 




-1, 


.33 


Transmembrane 


178 


- 194 


( 178 - 


194) 



Final Results 

bacterial membrane Certainty=0 . 7135 (Affirmative) < suco 

bacterial outside Certainty^O . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 1001 1> which encodes amino acid sequence <SEQ ID 
10012> was also identified. 

The protein has no significant homology witli any sequences in the GENPEPT database, but there is 
5 homology to SEQ ID 376. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useflil antigens for 
vaccines or diagnostics. 

Example 2196 

10 A DNA sequence (GBSx2314) was identified in S.agalactiae <SEQ ID 6781> which encodes the amino 
acid sequence <SEQ ID 6782>. Analysis of this protein sequence reveals the following: 



Possible site: 32 

>>> Seems to have no N-tenninal signal sequence 
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15 



20 

Final Results 

bacterial membrane Certainty=0 . 4270 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

25 bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9401> which encodes amino acid sequence <SEQ ID 9402> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

30 >GP:CfiA07482 GB:AJ007367 multi-drug resistance efflux pump 

[Streptococcus pneumoniae] 
Identities = 213/372 (57%) , Positives = 295/372 (79%) 

Query: 1 MPFMVLYVEQLGAPSNKVEWYAGLSVSLSALSSALVAPLWGRLADKYGRKPMMVRAGLMM 60 
35 +PFM ++VE LG S +V +YAGL++S+SA+S+AL +P+WG LADKYGRKPMM+RAGL M 

Sbjct: 28 VPFMPIFVENLGVGSQQVAFYAGLAXSVSAISAALFSPIWGILADKYGRKPMMIRAGLAM 87 

Query: 61 TFTMGGIAFIHSVTGLLILRILNGIFAGYVENSTALIASQAPQEESGYALGTLATGVTGG 120 
T TMGGLAF+ ++ L+ LR+LNG+PAG+VPN+TALIASQ P+E+SG ALGTL+TGV G 
40 Sbjct: 88 TITMGGLAFVPNIYTOilFLRLLNGVPAGFVPNATALIASQVPKEKSGSALGTLSTGVVRG 147 

Query: 121 MLIGPLLGGLLAEWFGIREVFLLVGTILLISTLMTIFMVKEDFKPISNEETMPTTEVFKS 180 

L GP +GG +AE FGIR VFLLVG+ L ++ ++TI +KEDF+P++ E+ +PT E+F S 
Sbjct: 148 TLTGPFIGGFIAELFGIRTVFLLVGSFLFLAAILTICFIKEDFQPVAKEKAIPTKELFTS 207 

45 

Query: 181 VKSLQILIGLFVTSMIIQISAQSIAPILTLYIRHLGQTENLMFVSGLIVSGMGFSSILSS 240 

VK +L+ LF+TS +IQ SAQSI PIL LY+R LGQTENL+FVSGLIVS MGPSS++S+ 
Sbjct: 208 VKYPYLLLNLFLTSFVIQFSAQSIGPILALYVRDLGQTENLLFVSGLIVSSMGFSSMMSA 267 

50 Query: 241 PKLGRIGDRIGlffiRLLLLALLYSFLMYVLCSLAQTSLQLGVIRFLYGFGTGALMPSINSI 300 

+G++GD++GNHRLL++A YS ++Y+LC+ A + LQLG+ RFL+G GTGAL+P +N++ 
Sbjct: 268 GWGKLGDKVGNHRLLVVAQFYSVIIYLLCSVNASSPLQLGLYRFLFGLGTGALIPGVNAL 327 



Query: 301 LTKIAPRQGLSRIFSYNQMFSNLGQVLGPFVGSAVSIHLGFRWVFFVTSFIVLRNFVWCF 360 
55 L+K+ P+ G+SR+F++NQ+F LG V+GP GSAV+ G+ VF+ TS V + ++ 
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Sbjct: 328 LSKMTPKAGISRVFAFNQVFFYLGGWGPMAGSAVAGQFGYHAVFYATSLCVAFSCLFNL 387 

Query: 361 INFRKYIRVKEI 372 
I FR ++VKEI 
■ Sbjct: 388 IQFRTLLKVKEI 399 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6783> which encodes the amino acid 
sequence <SEQ ID 6784>. Analysis of this protein sequence reveals the following: 

Possible site: 58 
>» Seems to have a cleavable N-term signal seq. 
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143 


- 162) 


INTEGRAL 


Likelihood 




-1, 


.70 


Transmembrane 


279 


- 295 


( 


279 


- 297) 


INTEGRAL 


Likelihood 




-0, 


.85 


Transmembrane 


209 


- 225 


( 


209 


- 226) 


INTEGRAL 


Likelihood 




-0, 


.27 


Transmembrane 


347 


- 363 


( 


347 


- 363) 



Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



-- CertaintyasO. 5055 (Affirmative) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 
-- Certainty^O. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:CAA074S2 GB:AJ007367 multi-drug resistance efflux punp 

[Streptococcus pneumoniae] 
■ Identities = 236/396 (59%) , Positives = 309/396 (77%) 

Query: 1 VNWRQNLKVAWLGNFFTGASFSLVMPFMALYVENLGTPTELVEYYAGLAVAVTALASALF 60 

+NW+ NL++AW GNF TGAS SLV+PEM ++VENLG ++ V +yAGLA++V+A+++ALF 
Sbjct: 4 INWKDNLRIAWFGNFLTGASISLWPFMPIFVENLGVGS(3QVAFYAGIiAISVSAISAALF 63 

Query: 61 APWGKLaDRYGRKPMMLRASFVMTFTMGGLAIIPm^FWLLILRLLTGVSAGYVENAT 120 

+P+WG LAD+YGRKPMM+RA MT TMGGLA +EN++WL+ LRLL GV AG+VENATAL 
Sbjct: 64 SPIWGILRDKYGRKPmiRAGIAMTITMGGLAFVPNIYWLIPLRLLNGVFAGFVPNATAL 123 



Query: 121 IASQAPKEESGYALGTLATGVTAGALIGPLLGGILAELLGIRQVFLLVGVILFLCSLMTA 180 

lASQ PKE+SG ALGTL+TGV AG L GP +GG +AEL GIR VFLLVG LFL +++T 
Sbjct: 124 IASQVPKEKSGSALGTLSTGWAGTLTGPFIGGFIAELFGIRTVFLLVGSFLFLAAILTI 183 

Query: 181 VYVKEEFKPVRRFEMIPTKVILKQVKSPQIMLGLFVTSMIIQISAQSVAPILSLYIRHLG 240 

++KE+F+PV + + IPTK + VK P ++L LF+TS +IQ SAQS+ PIL+LY+R LG 
Sbjct: 184 CFIKEDFQPVAKEKAIPTKELFTSVKYPYLLLNLFLTSFVIQFSAQSIGPILALYVRDLG 243 

Query: 241 QTHNLMFTSGLWSAMGFSSLFSSSYLGKLGDRFGNHRLLLAALCYSFIMYFSSALAQTS 300 

QT NL+P SGL+VS+MGFSS+ S+ +GKLGD+ GNHRLL+ A YS I+Y A A + 
Sbjct: 244 QTENLLFVSGLIVSSMGFSSMMSAGVMGKLGDKVGNHRLLVVaQFYSVIIYLLCftNASSP 303 

Query: 301 FQLGVLRFAYGFGVGALMPSINSLLTKLTPKEGISRVFAYNQMFSNLGQVIGPFIGSNWA 360 

QLG+ RF +G G GAL+P +N+LL+K+TPK GISRVFA+NQ+F LG V+GP GS VA 
Sbjct: 304 LQLGLYRFLFGLGTGALIPGVNALLSKMTPKAGISRVFAFNQVFFYLGGWGPMAGSAVA 363 

Query: 361 WLGYRSVFYVTSLIVFVNLIWSLIIFRKYIKVKDI 396 

GY +VFY TSL V + +++LI FR +KVK+I 
Sbjct: 364 GQFGYHAVFYATSLCVAFSCLENLIQFRTLLKVKEI 399 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 262/373 (70%) , Positives = 314/373 (83%) 

Query: 1 MPFMVLYVEQLGAPSNKVEWYAGLSVSLSALSSALVAPLWGRLADKYGRKPMMVRAGLMM 60 

MPEM LYVE LG P+ VE+YAGL+V+++AL+SAL AP+WG+LAD+YGRKPMM+RA +M 
Sbjct: 25 MPFMALYVENI/3TPTELVEYYA6LAVAVTAIASALFAPVWGKLADRYGRKPMMLRASFVM 84 



Query: 61 TFTMGGLAFIHSVTGLLILRILNGIFAGYVPNSTALIASQAPQEESGYALGTLATGVTGG 120 
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TFTMGGLA I +V LLILR+L G+ AGYVPN+XALIASQAP+EESGYALGTLATCIVT G 
Sbjct: 85 TFTMGGLAIIPNVFWLLILRLLTGVSAGYVPNATMilASQAPKEESGYALGTL^ 144 

Query: 121 ML•IGPLLGGLIJamTOIREVPLLVGTILLISTL^^'IF^TOCEDFKPISNEETMPTTEVFKS 180 

LIGPLLGG+LRE GIR+VFLLVG IL + +LMT VKE+FKP+ E +PT + K 
Sbjct: 145 MiIGPLLGGILAELLGIRQVFLLVGVILFLCSLMTAVYVKEEFKPVRRFEMIPTKVILKQ 204 

Query: 181 VKSLQILIGLFVTSMIIQISAQSIAPILTIiYIRHLGQTENLMFVSGLIVSGMGFSSIIiSS 240 

VKS. QI++GLFVTSMIIQISAQS+APIL+LYIRHLGQT NLMF SGL+VS MGFSS+ SS 
Sbjct: 205 VKSPQIMMLPVTSMIIQISAQSVAPILSLYIRHIXSQTHNLMFTSGLWSAMGFSSLFSS 264 

Query: 241 PKICRIGDRIGNHRLLLMLLYSFLMYVLCSIiAQTSLQLGVIRFLYGFGTGALMPSINSI 300 

LG++GDR GNHRLLL AL YSF+MY +LAQTS QLGV+RF YGFG GALMPSINS+ 
Sbjct: 265 SYLGKLGDRFGNHRLLLAALCYSFIMYFSSALAQTSFQLGVLRFAYGFGVGfliMPSINSL 324 

Query: 301 LTKlAPRQGLSRIFSYNQMFSNLGQVLGPFVGSAVSIHIfiFRWWFVTSFIVIJttlF^ 360 

LTK+ P++G+SR+F+YNQMFSNLGQV+GPP+GS V++ LG+R VF+VTS IV N +W 
Sbjct: 325 LTKLTPKEGISRWAYNQMFSS&GQVIGPFIGSNVAVVLGYRSVFYVTSLIVFVNLIW^ 384 

Query: 361 INERKYIRVKEIV 373 

I FRKYI+VK+IV 
Sbjct: 385 IIFRKYIKVKDIV 397 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2197 

A DNA sequence (GBSx2315) was identified in S.agalacttae <SEQ ID 6785> which encodes the amino 
acid sequence <SEQ ID 6786>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0. 2343 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB6998e GB:U34356 glycerol kinase [Enterococcus faecalis] 
Identities = 156/186 (83%) , Positives = 167/186 (88%) , Gaps = 1/186 (0%) 

Query: 3 SEEKYimiDQGTTSSRAIIENKKGEKIASSQKEFPQIFPQAGWVEHISIfiNQIWNSVQSVI 62 

+EEKYIMAIDQGTTSSRAIIP+KKG KI SSQKEF Q FP AGWVEHNAN+IWNSVQSVI 
Sbjct: 2 AEEKYIMAIDQGTTSSRAIIFDKKGNKIGSSQKEFTQYFPNAGWVEHWANEIWNSVQSVI 61 

Query: 63 AGAFIESSIKPGQIEAIGITNQRETTWWDKKTGLPIYNAIWQSRQTAPIADQLKQEGH 122 

AG+ lES +KP I IGITNQRETTWWDK TGLPIYNAIVWQSRQT PIflDQLK++G+ 
Sbjct: 62 AGSLIESGVKPTDIAGIGITNQRETTWWDKATGLPIYNAIVWQSRQTTPIADQLKEDGY 121 

Query: 123 TNMIHEKTGLVIDAYFSATKTOWILDHVPGAQERAEKGELIJGTIDTWLVWK^^ 182 

+ MIHEKTGL+IDAYFSATKVRWILDHV GAQERAE GEL+F6TIDTWLVWKLT G HV 
Sbjct: 122 SEMIHEKTGLIiraYFSATKTOWILDHVEGAQERAENGEIMFGTIDTWLVWKLT-GDTHV 180 

Query: 183 TDYSNA 188 

TDYSNA 
Sbjct: 181 TDYSNA 186 

There is also high homology to SEQ ID 2844: 

Identities = 174/186 (93%) , Positives = 182/186 (97%) 

Query: 3 SEEKYIMAIDQGTTSSRAIIFNKKGEKIASSQKEFPQIFPQAGWVEHNANQIWNSVQSVI 62 
S+EKyiMAIDQGTTSSRAIIFN+KGEK++SSQKEFPQIFP AGWVEHNANQIWNSVQSVI 



wo 02/34771 



PCT/GBOl/04789 



-2476- 



Sb j ct : 


2 




61 


Query: 


63 


AGAFIESSIKPGQIEaiGITNQRETTVVWDKKTGLPIYNAIVWQSRQTAPIADQLKQEGH 


122 






AGAPIESSIKP QIEAIGITNQRETTWWDKRTG+PnfMAIVWQSRQTAPIA+QLKQ+GH 




Sb j ct : 


62 


AGRFIESSIKPSOIEAIGITNORETTWWDKKTGVPIYNRIVWOSROTAPIAEOLKODGH 


121 


Qusnry : 


123 


TOMIHEKTGIiVIDAYFSATKVRWILDHVPGAQERAEKGELLFGTIDTWLVWKLTDGLVHV 


182 






T MIHEKTGLVIDAYFSATK+RWILDHVPGAQERAEKGELLFGTIDTWLVWKLTDG VHV 




Sb j ct : 


122 


TKMIHEKTGLVIDAYFSATKIRWILDHVPGaQERAEKGELLFGTIDTWLVWKLTDGAVHV 


181 


Query: 


183 


TDYSNA 188 








TDYSNA 




Sbjct: 


182 


TDYSNA 187 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2198 

A DNA sequence (GBSx2317) was identified in S.agalactiae <SEQ ID 6787> which encodes the amino 
acid sequence <SEQ ID 6788>. This protein is predicted to be glycyl-tRNA ssmthetase beta chain (glyS). 
Analysis of this protein sequence reveals the following: 

Possible site: 18 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2933 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certaintyi=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CftB14468 GB:Z99117 glycyl-tRNA synthetase (beta s-ubunit) 
[Bacillus subtilis] 
Identities = 315/687 (45%) , Positives = 447/687 (64%) , Gaps = 21/687 (3%) 



Query: 


3 


KDLLLELGIiEEriPAYWTPSEKQLGQKMVKFLEDHRLSFETVQIFSTPRRLAVRVKGLAD 


62 






+DLLLE+GLEE+PA + S QLG K+ +L++ ++ V++F+TPRRLAV VK +A+ 




Sb j ct : 


4 


QDLLLEIGLEEMPARFI^SMVQLGDKLTGWLKEKNITHGEWKLENTPRRIA^^ 


63 


Query: 


63 


QQTDLTEDFKGPSKKIALnaEGNFSKAAQGFVRGKlGLSVDDIEFREVKGEEYVYVTKHET 


122 






+Q D+ E+ KGP+KKIAI1DA+GN++KAA GF +G+G +V+D+ +EVKG EYV+V K + 




Sbjct: 


64 


KQDDIKEEAKGPAKKIALDADGNWTKAAIGFSKGQGANVEDLYIKEVKGIEYVFVQKFQA 


123 


Query: 


123 


GKSAIDVLASVTEVLTELTFPVMNHWANNSFEYIRPVHTLVVIjLDDQALELDPLDIHSGR 


182 






G+ +L ++ ++T L FP NM W N YIRP+ +V L + ++ SGR 




Sbjct: 


124 


GQETKSLLPELSGLITSLHFPKNMRWGNEDLRYIRPIKWIVALFGQDVIPFSITNVESGR 


183 


Query: 


183 


ISRGHRFLGSDTEISSASSYEDDLRQQFVIADAKERQQMIVNQIHAIEEKKNISVEIDED 


242 






++GHRFLG + I S S+YE+ L+ Q VIAD R+QMI +Q+ + + N S+ +DED 




Sbjct: 


184 


TTQGHRFLGHEVSIESPSAYEEQLKGQHVIADPSVRRQMIQSQLETMftAENNWSIPVDED 


243 


Query; 


243 


LLNEVIiNLVEYPTAFLGSFDEKyLDVPEEVLVTSMKNHQRYFWRDRDGKLLPNFISVRN 


302 






LL+EV +LVEYPTA GSF+ ++L +PEEVLVT+MK HQRYF V+D++G LLP+FI+VRN 




Sbjct: 


244 


LLDEVNHLVEYPTALYGSFESEFLSIPEEVLVTTMKEHQRYFPVKDKNGDLLPHFITVRN 


303 


Query: 


303 


GNAEHIENVIKGNEKVLVRRIBDGEFFWQEDQKMillADIiVEKLKQVTFHEKIGSLYEHMD 


362 






GN+ lENV +GNEKVL ARL D FF++EDQKIjNI V+KL+ + FHE++GSL + + 




Sb j ct : 


304 


GNSHAIENVARGNEKOT^RARLSDASFFYKEDQKLNIDANVKKLENIVFHEELGSIADKVR 


363 


Query: 


363 


RVKVISQYLAEKADLSDEEKIAVLRaASIYKFDLLTGMVDEFDELQGIMGEKYALLAGEQ 


422 






RV I++ lift. + ++ V RAA I KFDL+T M+ EF ELQGIMGEKYA + GE 




Sbjct: 


364 


RVTSIAEKLAVRLQADEDTLKHVKRAAEISKFDLVTHMIYEFPELQGIMGEKYARMLGED 


423 
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Query: 423 PAVAAAIREHYMPTSADGELPETRVGAILALADKFDTLLSFFSVGLIPSGSNDPYALRRA 482 

AVAAA+ EHYMP SA GE P T GA++A+ADK DT+ SFFS+G+IP+GS DPY L R 
Sbjct: 424 EAVAAA\nJEHYMPRSAGGETPSTFTGAWAMADKLDTIASFFSIGVIPTGSQDPYGLPRQ 483 

5 Query: 483 TQGITOILEaFGPTOIPLDELVTNLYGIiSFASI^iYANQKEVMAFISARIEKMIGS-KTO 541 
GIV IL W I +EL+T F D N E++ F + R++ ++ + ++ D 

Sbjct: 484 ASGIVaiLLDKNWGISFEELLT FVQTDKEN--ELLDFFTQRLKyVtNftEQIRHD 535 

Query: 542 IREAVLESDTYIVSLILEASQALVQKSKDAQYKVSVESLSRAFNLAEKVTHSVLVDSSLF 601 
10 + +AVLES L +Q L QK +K + E+L R ++++K + LF 

Sbjct: 536 VIDAVLESSELEPYSaLHKAQVLEQKlXSaPGFKETAEaLGRVISISKKGVRGD-IQPDLP 594 

Query: 602 ENNQEKRLYQAILSLELTEDMHDNLDK LFALSPIINDPFDNTMVMTDDEKM 652 

EN E L+ A + + E++ +N K L AL 1+ +FD+TMV+ D+E + 

15 Sbjct: 595 ENEYEAKLFDAYQTAK--ENLQENFSKKDYEAALASLAALKEPIDAYFDirrMVIADNESL 652 

Query: 653 KQNRLAILNSLVAKARTVAAFNLLNTK 679 

K NRLA + SL + ++ A N L K 
Sbjct: 653 KANRLAQMVSLADEIKSFANMNALIVK 679 

20 

A related DNA sequence was identified in S.pyogenes <SEQ E) 2835> which encodes the amino acid 
sequence <SEQ ID 2836>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>">> Seems to have no N-tertninal signal sequence 
25 INTEGRAL Likelihood = -0.96 Transmembrane 450 - 466 ( 450 - 466) 

Final Results 

bacterial menOarane — Certainty=0. 1383 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

30 bacterial cytoplasm Certainty= 0.0000 (Not Clear) < suco 

• An alignment of the GAS and GBS proteins is shown below. 

Identities = 505/679 (74%) , Positives = 578/679 (84%) 

35 Query: 1 MTKDLLLELGLEELPAYWTPSEKQLGQKMVKFLEDHRLSFETVQIFSTPRRLAVRVKGL 60 

M+K+LL+ELGLEELPAYVVTPSEKQLG+++ FL ++RLSFE +Q FSTPRRLAVRV GL 
Sbjct: 1 MSKNLLIELGLEELPAYWTPSEKQLGERLATFLTENRLSFEDIQTFSTPRRLAVRVSGL 60 

Query: 61 ADQQTDLTEDFRQPSKKIALDAEGNFSKAAQGPVRGKGLSVDDIEPREVKGEEYVYVTKH 120 
40 ADQQTDLTEDFKGP+KKIALnA+GNPSKAAQGFVRGKGL+ D lEFREVEGEEYVYVTKH 

Sbjct: 61 ADQQTDLTEDFKGPAKKIAU3ADGNFSKaAQGPVR6KGLTTnAIEFREVKGEEYVYVTKH 120 

Query: 121 ETGKSAIDVLASVTEVLTELTFPVNMHWANNSPEYIRPVHTLWLLDDQALELDFLDIHS 180 
E GK A +VL VTEVL+ H-TFPV+MHWANNSFEYIRPVHTL VLL+D+ALELDFLDIHS 
45 Sbjct: 121 EAGKPAKEVLLGVTEVLSAMTFPVSMHWANNSPEYIRPVIITLTVLIiNDEftLELDFLDIHS 180 

Query: 181 GRISRGHRFLGSDTEISSASSYEDDLRQQFVIADAKERQQMIVNQIHAIEEKKNISVEID 240 

6R+SRGHRFLG++T I+SA SYE DLR QFVIADAKERQ+MIV Ql +E ++ + V+ID 
Sbjct: 181 GRVSRGHRFLGTETTITSADSYEADLRSQFVIADAKERQEMIVEQIKTLEVEQGVQVDID 240 

50 

Query: 241 EDLLNEVLNLVEYPTAFLGSFDEKYLDVPEEVLVTSMKNHQRYFVVRDRDGKLLPNFISV 300 

EDLLNEVLNLVE+PTAF+GSF+ KYLDVPEE\njVTSMKNHQRYEWRD+ G L+PNF+SV 
Sbjct: 241 EDLUffiVLNLVEFPTAFMGSFEAKYLDVPEEVLVTSMKNHQRYFVVRDQAGHLMPNFVSV 300 

55 Query: 301 RNGNREHIENVIiOSNEKVLVaRIiEIXSEFFWQEDQKLNIADLVEKLKjQVTFHEKIGSLYEH 360 

RNGN + lENVIRtaiEKVLVaRLEDGEFFW+EDQKL lADLV KL VTPHEKIGSL EH 
Sbjct: 301 RN6NDQAIENVIKGNEKVL^ZRRLEDGEFFWREDQKLQIADLVaKLTIm^FHEKIGSLAm 360 

Query: 361 MDRVKVISQYLAEKADLSDEEKLAVLRAASIYKFDLLTGMVDEFDELQGIMGEKYALLAG 420 
60 MDR +VI+ LA++A+LS EE AV RAA lYKFDLLTGMV EPDELQGIMGEKYALLAG 

Sbjct: 361 ^©RTRVIAflSLaKEaNLSAEEVTAVDRaAQIyKFDLLTG^^7GEEDEIlQGIMGEKyAIJCJ^ 420 

Query: 421 EQPAVAAAIREHYMPTSADGELPETRVGAILALADKFDTLLSFFSVGLIPSGSNDPYALR 480 
E AVA AIREHY+P +A G LPET+VGA+LALA K DTLLSFFSVGLIPSGSNDPYALR 
65 Sbjct: 421 EDAAVATAIREHYLPDAAGGALPETKVGAVLALAAKLDTLLSFFSVGLIPSGSNDPYALR 480 
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Query: 481 RATQGIWILEAFGWDIPLDELVTNLYGLSFASLDYANQKEVMAFISARIEKMIGSKVPK 540 

RATQGIVRIL+ FGW IP+D+LV +LY LSF SL YAN+ +VM FI AR++KM+G PK 
Sbjct: 481 RATQGIWILDHFGWRIPlC)KLVDSLYDLSFDSLTYMSIKftDVMNFIRARTO 540 

Query: 541 DIREAVLESOTYIVSLILEASQftLVQKSKmQYKVSVESLSRAFNLAEK^ 600 

DIREA+L S T++V +L A++ALV+ S YK +VESLSRAENI1AEK SV VD SL 
Sbjat: 541 DIREAlIASSTPVVPEMLRAftEaLVKASHTEOTKPAVESLSRAFNLAEKADASVQVDPSL 600 

Query: 601 FENNQEKALYQAILSLELTEDMHDNLDKLFALSPIINDFFDNT^1VMTDDEICMKQNRLAIL 660 

FEN QE L+ AI L L L+++FALSP+INDFFDNTMVM D+ +K NRLAIL 

Sbjct: 601 FENEQENTLFAAIQGLTLAGSAAQQLEQWALSPVIlTOFPDNTMVMAGDQftLKNNRLAIL 660 

Query: 661 NSLVAKARTVAAENLUITK 679 

+ LV+KA+T+ AFN lOTK 
Sbjct: 661 SDLVSKAKTIVAFNQLNTK 679 

Based on this analysis, it was predicted that these proteins and their epitopes could be useftil antigens for 
vaccines or diagnostics. 

Example 2199 

A DNA sequence (GBSx2318) was identified in S.agalactiae <SEQ ID 6789> which encodes the amino 
acid sequence <SEQ ID 6790>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2182 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certaintyi=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD24436 GB:AF112858 NaD(P)H dehydrogenase [Bacillus 

stearothermophilus] 
Identities = 64/174 (36%) , Positives = 98/174 (55%) , Gaps = 6/174 (3%) 

Query: 2 NTLIVNSHPDFSNPYSFTTILQEKFIELYNEHFPNHQLSILNLYDCVLPEITKEVLLSIW 61 

N L + +HP + S++ + + FI+ Y + P+H++ L+LY +PEI +V S W 
Sbjct: 3 imiYITAHPH-DDTQSYSMAVGKRFIDTYKQiVHPDHEVIHLDLYKEYIPEIDVDVF-SQW 60 

Query: 62 SKQRKGL---ELTADEIVQAKISKDLLEQFKSHHRIVFVSPMHNyNVTARAKTyiDNIFI 118 

K R G EL+ +E + +L EQF S + VFV+PM N++ K YID + + 

Sbjct: 61 GKLRSGKSFEELSDEEKSUCVGRMNELCEQFISADKYVPVTPMWNFSFPPVLKAYIDAVAV 120 

Query: 119 AGETFKYTENGSVGLMTDDYRLLMLESAGSIYSKGQYSPYEPPVHYLKAIFKDF 172 

AG+TFKYTE 6 VGL+TD + L +++ G YS+G + E YL I + F 
Sbjct: 121 AGRTFKYTEQGPVGLLTDK-KAIiHIQftRGGFYSEGPAAEMEMGHRYLSVIMQFF 173 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2200 

A DNA sequence (GBSx2319) was identified in S.agalactiae <SEQ ID 679 1> which encodes the amino 
acid sequence <SEQ ID 6792>. This protein is predicted to be glycyl-tRNA synthetase (glyQ). Analysis of 
this protein sequence reveals the following: 

Possible site: 56 
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»> Seems to have no N-terminal signal sequence 



; : Final Results 

bacterial cytoplasm Certainty=0 . 1364 (Affirmative) < suco 

5 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside CertaintY=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 952 1> which encodes amino acid sequence <SEQ ID 9522> 
was also identified. 

1 0 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BaB05089 GB:AP001511 glycyl-tRNA synthetase (alpha siibunit) 
[Bacillus halodurans] 
Identities = 222/287 (77%) , Positives = 250/287 (86%) 

15 Query: 6 LTFQEIILTLQQFWNDQGCMLMQAYDNEKGAGTMSPYTFLRAIGPEPWNAAYVEPSRRPA 65 

+ Q +ILTLQ++W+ Q C+L+QAYD EKGAGTMSPYT LR IGPEPWN AYVEPSRRPA 
Sbjct: 1 MNVQTMILTLQEYWSKjQNCILrlQATOTEKBAGTMSPYT^^ 60 

Query: 66 DGRYGENENRLYQHHQFQVVMKPSPSNIQELYLKSLELLGINPLEHDIRFVEDNWENPST 125 

20 DGRYGENPNRLYQHHQFQV+MKPSP+NIQELYL SL LGINPLEHDIRFVEDNWENPS 

Sbjct: 61 DGRYGENPNRLYQHHQFQVIMKPSPTNIQELYLDSLRALGINPLEHDIRFVEDNWENPSL 120 

Query: 126 GSAGLGWEVWLDGMEITQFTYFQQVGGLQTGPVTSEVTYGLERLASYIQEVDSVYDIEWA 185 
G AGLGWEVWLDGMEITQFTYFQQVGGL+ PV++E+TYGLERLASYIQ+ ++V+D+EW 
25 Sbjct: 121 GaVGLGWEVWLDGMEITQFTYFQQVGGLEANPVSaEITYGLERLASYIQDKENVFDIiEWV 180 

Query: 186 PGVKyGEIFTQPEYEHSKYSFEISDQVMLLENFEKFEREAKRALEEGLVHPAYDYVLKCS 245 

G YG+IFTQPEYEHSKY+FE+SD ML E F +E+EA RftLEE LV PAYDYVLKCS 
Sbjct: 181 EGFTYGDIFTQPEYEHSKYTFEVSDSftMLFELFSTYEKEADRALEENLVFPAYDYVLKCS 240 

30 

Query: 246 HTFNLLDARGAVSVTERAGYIARIRNLARWAKTFVAERKKLGFPLL 292 

HTFNLIiDARGA+SVTER GYI R+RNLAR AK + ER+KLGFP+L 
Sbjct: 241 HTFNLLDARGAISVTERTGYIGRVRNLRRKCSUCKYYEEREKLGFPML 287 

35 A related DNA sequence was identified in S.pyogenes <SEQ ID 6793> which encodes the amino acid 
sequence <SEQ ID 6794>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

»> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty^O . 2081 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

45 An alignment of the GAS and GBS proteins is shown below. 

Identities = 290/304 (95%) , Positives = 294/304 (96%) 

Query: 2 MSKKLTFQEIILTLQQFVWTOQGCMLMQAYDNEKGAGTMSPYTFLRAIGPEPWNAAYVEPS 61 
MSKKLTFQEIILTLQQ+WNDQGCMLMQAYDNEKGAGTMSPYTFLRAIGPEPWNAAYVEPS 
50 Sbjct: 1 MSKKLTFQEIILTLQQYWlTOQGCMLMQAYDNEKGAGTMSPYTFLRAIGPEPmAAYVEPS 60 

Query: 62 RRPADGRYGENPNRLYQHHQFQWMKPSPSNIQELYLKSLELLGINPLEHDIRFVEDNWE 121 

RRPADGRYGENPNRLYQHHQFQWMKPSPSNIQELYL SLE LGINPLEHDIRFVEDNWE 
Sbjct: 61 RRPADGRYGENPMRLYQHHQFQWMKPSPSNIQELYLASLEKLGINPLEHDIRFVEDNWE 120 

55 

Query: 122 NPSTGSAGLGWEVWLDGMEITQFTYFQQVGGLQTGPVTSEVTYGLERLASYIQEVDSVYD 181 

NPSTGSAGLGWEVWLDGMEITQFTYFQQVGGL T PVT+EVTYGLERLASYIQEVDSVYD 
Sbjct: 121 NPSTGSAGLGWEVWLDGMEITQFTYFQQVGGLATSPVTAEVTYGLERLASYIQEVDSVYD 180 

60 Query: 182 lEWAPGVKYGEIFTQPEYEHSKYSFEISDQVMLLENFEKFEREAKRALEEGLVHPAYDYV 241 

lEVlAPGVKYGEIP QPEYEHSKYSFEISDQ MLLENFEKFE+EA RALEEGLVHPAYDYV 
Sbjct: 181 IEWAP6VTaGEIFLQPEYEHSKYSFEISDQDMLLENFEKFEKEASRAI.EEGLVHPAYDYV 240 
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Query: 242 LKCSHTFNLLDARGAVSWERAGYIARIRNLARWAKTFVAERKKLGFPLLDEETRIKLL 301 

LKCSHTBWLIinARGAVSVTERAGYIARIRNrJUJVVAKTFVaERKKLGFPLIiDE TR hh 
Sbjct: 241 LKCSHTFNLI^MlGftVSVTERaGyiARimttJUlWAKTFVSffiRKKLG^^ 300 

5 

Query: 302 AEED 305 

AE+D 

Sbjct: 301 AEDD 304 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2201 

A DNA sequence (GBSx2320) was identified in S.agalactiae <SEQ ID 6795> which encodes the amino 
acid sequence <SEQ ID 6796>. This protein is predicted to be vacB protein (vacB). Analysis of this protein 
15 sequence reveals the following: 

Possible site: 60 

»> Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0 . 2966 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9399> which encodes amino acid sequence <SEQ ID 9400> 
25 was also identified. 

The protein has homology with the following sequences in the GENPiEPT database. 

>GP:CAB15366 GB:Z99121 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 338/780 (43%) , Positives = 485/780 (61%) , Gaps = 47/780 (6%) 

30 Query: 4 AKAFPKLIKTISiniESHRQL- - -RFDDNGSLSLQKKEAKKKEITVI«3LFRANKAGra^ 59 
A+ F +L+K + LE + R D G +K ++G A+ GF FL 

Sbjct: 36 AEEFKELVKALVfiLEEKGLlVRTRSDRYG IPEKMNLIKGKISaEffiKGFAFIiL 87 

Query: 60 SIDQDEDDMFIGKNDIAYAIDGDTVEAWKKPADRLNGTAAEARWNIVERSLKTLVGKF 119 
35 D D+FI N++ A++GD V + + +G+ E V+ I+ER+++ +VG + 

Sbjct: 88 PEDTSLSDVFIPENELlSITAl«Il!KDIVMVRmSQS---SGSRQEGTV^ 144 

Query: 120 VLDDERPKYAGYIKSKIIQKINQKIYIRKEPV--VI1DGTEIIKVDIDKYPTRGHDYFVASV 177 

+ G++ ++KI r+r K +G +++ V + YP G V 

40 Sbjct: 145 T ETRNFGFVIPDDKKITSDIFIPKNGKNGAAEGHKW-VKLTSYP-EGRMNAEGEV 198 

Query: 178 RDIVGHQGDVGIDVtEVIiESMDIVSEFPEDVIAEfllJAIPDAPTEKDIilGRVDLRQEVTFT 237 

I+GH+ D GID+L V+ + EFP D + +A++ PD EKDL R DLR +V T 
Sbjct: 199 ETILGHKiroPGIDILSVIHKHGLPGEFPAnAMEQASSTPDTIDEKDIiKDRRDLRDQVIVT 258 

45 

Query: 238 IDGADAKDLDDAVHIKLLDNGHFELGVHIADVSYYVTEGSALNREALSRGTSVYVTDRW 297 

IDGADAKDUDDAV + LD+G ++LGVHIADVS+YVTE S +++EAL RGTSVY+ DRV+ 
Sbjct: 259 IDGADAKDLDDAVTVTKLDDGSYKLGVHIAOTSHYVTENSPIDKEftLERGTSVYLVDRVI 318 

50 Query: 298 PMLPERLSNGICSLNPNLDRLTQSCIMEIDQNGRWNHQITQSVINTTYRMTYTAVNDII 357 

PM+P RLSNGICSLNP +DRLT SC M 1+ G+V H+I QSVI TT RMTY+ VN 1+ 
Sbjct: 319 PMIPHRLSNGICSUIPKVDRLTLSCEMTINSQGQVTEHEIFQSVIKrrERMTYSDVNKIL 378 

Query: 358 A-GDEEICSEYESIVSSVQHMVTLHHTLEAMRTRRGALNFDTSEAKIMVNDKGMPVDIVI 416 

55 DEE+ +YE +V + M L L R RGA++FD EAK++V+D+G D+VI 

Sbjct: 379 VDDDEELKQKYEPLVPMFKDMERIAQILRDKRMDRGAyDFDFKEAKVLVDDEGAVKDVVI 438 



Query: 417 RmGIAERMIESFMLAANETVAEHYARLKLPFIYRIHEEPKAEKLQKFIDYASVFGVQIQ 476 
R R +AE++IE PML fiNETVaEH+ + +PFIYRIHEEP AEKLQKF+++ + PG ++ 
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Sbjct: 439 RERSVAEKLIEEFMLVANETVAEHFHWNMVPFIYRIHEEPNAEKLQKFLEFVTTFGYVVK 498 

Query: 477 GTATKITQSALQDFMKKVQGQPGSEVLSMMLLRSMQQAEYSEHNHGHYGLAAEYYTHFTS 536 

GTA I ALQ + V+ +P . V+S ++LHSM+QA+Y + GH+GL+ E+YTHFTS 
Sbjct: 499 GTAGNIHPRALQSILDAVIUDRPEEWISTVMLRSMKQAKYDPQSLGHFGLSTEFYTHFTS 558 

Query: 537 PIRRYPDLLVHRMIRDY-DDKaMDKA--DHFANLIPEIATQTSSIiERRAinftERIVEftMK 593 

PIRRYPDL+VHR+IR Y + +D+A + +A +P+IA TSS+ERRA+DAER + +K 
Sbjct: 559 PIRRYPDLIVHRLlRTYLINGKVDEATQEKWAERLPDIAEHTSSMERRAVDftERETDDLK 618 

Query: 594 KAEYMEEYVGEEFEGWASWKFGMFVELPNTIEGLIHVTTL-PEYYHFNERTLTLQGEK 652 

KAEYM + +GEEF+G+++SV FGMFVELPNTIEGL+HV+ + +YY F+E+ + GE+ 
Sbjct: 619 KAEYmjDKIGEEFDGMISSVTNFGMFVELPISrriEGLVHVSFMTDDYYRFDEQHFAMIGEiR 678 

15 Query: 653 SGKVFRVGQQIKVKLIRSDKETGDIDFDYLPSDFDIVEKVSKSSREGRENRSSKREHQHR 712 

+G VFR+G +1 VK++ +K+ +IDF+ + +G P R + + 
Sbjct: 679 TGNVFRIGDEITVKWDVNKDEKNIDFEIV GMKGTPRRPRELD 721 

Query: 713 ISDRDNKNKOTSKKKASRKPKRNSDSKSHHHKDDRTTGSTKKKTKKPFYKGVAKIC^ 772 
20 S R K ++K+ S + S K + T KKK K+ F +K +K+K 

Sbjct: 722 -SSRSRKRGKPARKRVQSTNTPVSPAPS-EEKGEWFTKPKKKKKKRGFQISIAPKQK^^ 779 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6797> which encodes the amino acid 
sequence <SEQ ID 6798>. Analysis of this protein sequence reveals the following: 

25 Possible site: 30 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 0811 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0, 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 579/773 (74%) , Positives = 664/773 (84%) , Gaps = 22/773 (2%) 

35 

Query: 1 mGAKAFPKLIKTISNLESHRQLRFDDNGSLSLQKKEAKKKEITVRGLFRftNKAGFGPLS 60 

MAGAK FP LIKTIS +ES LRF D+GSL+L+K+ KKKE TV+G+FRA3SlKaGFGFIi 
Sbjct: 27 MA(33Ua^FPSLIKTISK^ffiSQSIll,RFSDDGSLaIlRKEREKKKEPTVQ^ 86 

40 Query: 61 IDQDEDDMFIGKlTOIAYRIDGDTVEAVVKKPADRriNGTAAEaRVVNIWRSriKTL^ 120 

+D++EDDMPI6+ND+ YAIDGDTVE WKKPADRL GTAAEA+W IV+RSLKT VG F+ 
Sbjct: 87 VDENEDDMPIGRNDVGYAIDGDTVEVWKKPADRLKBTAAEAroTVAIVDRSLKTAVGTFI 146 

Query: 121 LDDERPKYAGYIKSKNQKINQKIYIRKEPWLDGTEIIKVDIDKYPTRGHDYFVASVRDI 180 
45 LDD++PKYAGYI+SKNQKI QKIYI+KEPWL GTEIIKVDIDKYP RGHDYFVASVRDI 

Sbjct: 147 tDDDKPKYAGYIRSKNQKIQQKIYIKKEPWLKGTEIIKVDIDKYPIRGHDYFVASVRDI 206 

Query: 181 VGHQGDVGIDVLEVLESMDIVSEFPEDVIAEaiaiPDAPTEKDLIQRVDLRQEVTFTIDG 240 
VGHQGDVGIDVLEVLESMDIVSEFP +V+AEANAI +APT KDLIGRVDLRQE T TIDG 
50 Sbjct: 207 VGHQGDVGIDVLEVLESMDIVSEFPAEVLAERNAISEftPTAKDLIGRVDLRQETTITIDG 266 , 

Query: 241 ADAKDLDDAVHIKLLDNGHFELGVHIADVSYYVTEGSAIJilREALSRGTSVYVTDRWPML 300 

ADAKDIiDDA+HXKLLDNG++ELGVHIADVSYYVTEGSAL++EA++RGTSVYVTDRWPML 
Sbjct: 267 AnRKDLDDAIHIKLLDNGNYELGWIADVSYYVTEGSariDKEAIARGTSVYVTDRVVPML 326 

55 

Query: 301 PERLSNGICSUIPNLDRLTQSCIMEIDQNGRVVNHQITQSVINTTYRMTYTAVNDIIAGD 360 

PERLSNGICSLNPN+DRLTQS +MEI+ G WN+QI QSVI TTYRMTY+ VND+IAGD 
Sbjct: 327 PERLSNGICSIJSIPNlDRLTQSAIMEINSQGHVVlSnfQICQSVIKTTYRjnTSTVK^ 386 

60 Query: 361 EEICSEYESIVSSVQHMVTLHHTLEAMRTRRGALNFDTSEAKIMVNDKGMPVDIVIRMRG 420 

EE E+ SI V MV LH LEAMR++RGALNFDT EAKI+VNDKGMPVD+V+R RG 
Sbjct: 387 EEALQEFASIADDVTL^mLHRIIiEA^KSKRGA[OTDTQEAKlIVNDKGMPVDVVIlRQRG 446 

Query: 421 lAERMIESFMIJUaffiTWAEHYARLKLPFIYRIHEEPKAEKLQKPIDYaSVFGVQIQGTAT 480 
65 lAERMIESFMLAANE VAEH+Ah- KLPFIYRIHEEPKAEKLQ+FIDYAS FG+ IQ6TA 
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Sbj'ct: 


447 


lAERMIESFMLflfllffiCraEHFAK^UCLPFIYRIHEEPKaEKLQQFIDYASTPGIHl^ 


506 


Query: 


481 


KITQSALQDFMKKVQGQPGSEVLSMMLLRSMQQARYSEHNHGHYGLAAEYYTHFTSPIRR 


540 






KI+Q ALQ FM KV+GQPG+EVL+MMLLRSMQQARYSEHNHGHYGLAAEYYTHFTSPIRR 




Sb j ct : 


507 


KISQEaLQAFMaKOTGQPGftEVLNMMLLRSMQQARYSEHiraGHYGLAftE^^^ 


566 


Query: 


541 


yPDLLVHRMIRDYDDKftMJKADHPAimiPEIATQTSSLERRAIDAERIVEAMKKAEY*^ 


600 






YPDLLVHRM+R+y+ + +K DHPA +IPE+AT +S LERRAIDAER+VEftMKKREYM E 




Sb j ct : 


567 


YPDLLVHRMVEEYNQPSQEKRDHFAQIIPEIiATSSSQLERRAIDAERVVERMKKftEyMft^ 


626 


Query: 


601 


YVGEEFEGWASWKFGMFVELPNTIEGLIHVTTLPEYYHENERTLTLQGEKSGKVFRVG 


660 






YVGEEF+G+V+SWKFG FVELPNTIEGL+H+T+LPEYYHFNERTL+1<3GEKSGKVF+VG 




Sbjct: 


627 


YVGEEFDGIVSSVVKFGFFVELEOTIEGIjVHITSLPEYYHETSERTLSLQGEKSGKVFKVG 


686 


Query: 


661 


QQIKVKLIRSDKETGDIDPDYIiPSDPDIVEKVSKSSREGRPNRSSKREHQHRISDRDNKN 


720 






Q I+VKL+++DKETGDIDF+YLPSDFD+VEK+ S + R +R K+ 




Sb j ct : 


687 


QPIRVKLVKADKETGDIDFEYLPSDFDWEKIKMSDKRSRRDR RKS 


732 


Query: 


721 


KOTSKKKASRKPKRNSDSKSHHHKDDRTTGSTKKKTKKPFYKGVAKKGQKRKS 773 








+SK ++PK + +K T G TKK +KKPFYK AKK +++S 




Sbjct: 


733 


SKSSKBTKKKEPKEVaiCaK TKGKTKKGSKKPFYKEQaKKKSRKRS 777 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2202 

A DNA sequence (GBSx2321) was identified in S.agalactiae <SEQ ID 6799> which encodes the amino 
acid sequence <SEQ ID 6800>. This protein is predicted to be VacB homolog (smpB). Analysis of this 
protein sequence reveals the following: 
Possible site: 41 

»> Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certaintyi=0. 2988 (Affirmative) < suco 

bacterial membrane Certainty^O. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC23745 GB:AF052209 VacE homolog [Streptococcus pneumoniae] 
Identities = 121/155 (78%) , Positives = 139/155 (89%) 

Query: 1 MVKGQGWWAQNKKAHHDYTIVETIEAGIVLTGTEIKSVRAARITLKDGYAQIKNGEAWL 60 

M KG+G WAQNKKA HDYTIV+T+EAG+VLTGTEIKSVRAARI LKDG+AQ+KNGE WL 
Sbjct: 1 MAKGEGKVVAQNKKARHDYTIVDTLEAGMVLTGTEIKSVRAARINLKDGFAQVKNGE 60 

Query: 61 INVHITPYDQGNimQDPDRTRKLLLKKREIEKISlJELKGTGMTLVPLKVYLKDGFAKVL 120 

NVHl PY++GNIWNQ+P+R RKLLL K++I+K+ E KGTGMTLVPLIWY+KDG+AK+L 
Sbjct: 61 SNVHIAPYEEGNIWNQEPERRRKLLLHKKQIQKLEQETKGTGMTLVPLJCVYIKDGYAKLL 120 

Query: 121 LGLAKGKHDYDKRESIKRREQNRDIARQLKNYNSR 155 

LGLAKGKHDYDKRESIKRREQNRDIAR +K N R 
Sbjct: 121 LGLAKGKHDYDKRESIKRREQNRDIARVMKAVNQR 155 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6801> which encodes the amino acid 
sequence <SEQ ID 6802>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2918 (Affirmative) < suco 
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bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty= 0.0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 124/155 (80%) , Positives = 145/155 (93%) 

Query: 1 MVKGQGNWAQNKKAHHDYTIVETIEAGI\ni,TGTEIKSVRAARITLKDGYAQIKNGEAWL 60 

M KG+G+++AQNKKA HDY IVET+EAGIVLTGTEIKSVRAARI LKDG+AQIKNGEAWL 
Sbjct: 1 MAKGEGHILAQNKKARHDYHIVETVEAGIVLTGTEIKSVRAARIQLKDGFAQIKNGEAWL 60 

Query: 61 INVHITPYDQGNIWQDPDRTRKIiLLKKREIEKISNELKGTGMTLVPLKVYLKDGFAKVL 120 

+NVHI P++QGNIWN DP+RTRKLLLKKREI ++ilffiLKiG+GMTLVPLKVyLKDGFAKVL 
Sbjct: 61 VNVHIAPFEQGNIWNADPERTRKriLKKREITHLftNELKGSGMTLVPLK^ 120 

Query: 121 LGIiAKGKHDYDKRESlKRREQNRDIARQLKNXNSR 155 

+GIAKGKH+YDKRE+1KRR+Q RDI +Q+K+YN+R 
Sbjct: 121 IGLAKGKHEYDKKETIKRRDQERDIKKQMKHYNRR 155 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2203 

A DNA sequence (GBSx2322) was identified in S.agalactiae <SEQ ID 6803> which encodes the amino 
acid sequence <SEQ ID 6804>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 6876 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

Example 2204 

A DNA sequence (GBSx2323) was identified in S.agalactiae <SEQ ID 6805> which encodes the amino 
acid sequence <SEQ ID 6806>. This protein is predicted to be d-serine/d-alanine/glycine transporter (cycA). 
Analysis of this protein sequence reveals the following: 

Possible site: 55 

>>> Seems to have a cleavable N-tenii signal seq. 
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Final Results 

bacterial membrane -■ 
bacterial outside -• 
bacterial cytoplasm -■ 



- Certainty=0. 4609 (Affirmative) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9397> which encodes amino acid sequence <SEQ ID 9398> 
, was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14651 GB:Z99117 amino acid permease [Bacillus subtilis] 
Identities = 165/361 (45%), Positives = 227/361 (62%), Gaps.= 17/361 (4%) 

Query: 1 MGIFLT-LSYWISLIFIGMaEITAVGEYVQFWFPEWPSWIIQIVFLAILSSINLIAVKAF 59 

M F+T +YW 1 + Mft++rAVG Y Q+W P+ P M+ ++ L Ih +IIL VK F 
Sbjct: 95 MftAFITGVn™PCWISIAMaDLTAVGIYTQyra,PDVPQWLPGLI^ 154 

Query: 60 GETEFWFAMIKVIAILGLIATGIF^WLT^IFDTGHGYHASISNITmFEWPPIraKLNF™a 119 

GE EFWFA+IKVIAIL LI TGI ++ F G AS++N+ +H PP G F ++ 
Sbjct: 155 GElBFWFALIKVIAIXJU^IVTGILLlAKGFSaASG-PASIJJJNLWSHGGMPPNGW^^^ 213 

Query: 120 FQiWFPAYLAIEFVGVTTSETANPRKVLPKAlQEIPMRIILFYAGSLLAIMAIFPWQQLP 179 

FQMV FA++ IE VG+T ET NP+KV+PKAI +IP+RI+LFY G-fL IM I+PW L 
Sbjct: 214 FQMWFAFVGIELVGLTAGETENPQKVIPKAINQIPVRILLFYVGALFVIMCIYPWNVLN 273 

Query: 180 VNESPFVTVFKlAGIKWAAALINFVVLTSAASAIJISTLYSTGRHLFQrANE--SES^ 237 

NESPPV VF GI AA+LINFWLTSAASA NS L+ST R ++ lA + +P L K 
Sbjct: 274 PNESPPVQVFSAVGIWftASLINFWLTSAASAfiNSALFSTSRMVYSLAKDHHAPGLLKK 333 

Query: 238 ALKLDQLSRQSVPSRAIIAS--AVIVGflSALISVLPGISDAFSLITASSSGVYISIYVLI 295 

L+ +VPS A+ S A+++G S L ++P F+LIT+ S+ +1 1+ + 

Sbjct: 334 LTSSWVPSliMFFSSIAILIGVS-IJm«P--EQVFTLITSVSTICFIFIWGIT 384 

Query: 296 MIAHWKYRKS- -PDF^ffiDGYKMPAYKILSPITLLFFLPVFVSLPLQDSTYIGAIGATIWII 354 

+1 H KYRK+ + + +KMP Y + + +TL F F+ V L L + T I +W + 

Sbjct: 385 VICHLKYRKTRQHEAKaNKFKMPFYPLSNYLTIAFLAFILVILALANDTRIALFVTPW 445 

There is also homology to SEQ ID 4070: 

Identities = 286/364 (78%), Positives = 322/364 (87%), Gaps = 1/364 (0%) 

Query: 2 GIPLTLSYWISLIFIGMAEITAVGEYVQFWFPEWPSWIIQIVFLAILSSINLIAVKAFGE 61 

G P LSYWISLIFIGMAEITAVG YVQPWFP WP+W+IQ+VFL +LSSINLIAV+ FGE 
Sbjct: 101 GYFSGLSYWISLIFIGMftEITAVGAYVQFWFPSWPAWLIQLVFLVLLSSIKLIAVRVFGE 160 

Query: 62 TEPWFAMIKVIAILGLIATGIF^m,TNFDTGHGYHASISNITNHFEWFPRGKIJ!IFPMAFQ 121 

TEFWFAMIK++AIL LIAT IFMVLT F+T H HAS+SNI +HF FP GKL FFMAFQ 
Sbjct: 161 TEFWFAMIKIIAILALIATAIFMVLTGFET-HTGHASLSNIFDHFSMFENGKLKFFMAFQ 219 

Query: 122 MVPFAYIAIEFVGVTTSETANPRKVLPKAIQEIPMRIILFYAGSLLAIMAIFPWQQLPW 181 

MVPFAY AIEFVG+TTSETANPRKVLPKAIQEIP RI++FY G+L++IMAI PW QLPV+ 
Sbjct: 220 MVPFAYQAIEFVGITTSETANPRKVLPKAIQEIPTRIVIFYVGALVSIMAIVPWHQLPVD 279 

Query: 182 ESPFVWFKrAGIKWAAALINFVVLTSAASAXJJSTLYSTGRHLFQMJESPNA^ 241 

ESPFV VFKL GIKWAAALINFWLTSAASatNSTLYSTGRHL+Q+ANE+PMaLT LK+ 
Sbjct: 280 ESPFVMVFKLIGIKWAftALINFVVLTSRASftIi^STLYSTGRHLYQIANETP^IALMRLKI 339 

Query: 242 DQLSRQSVPSRAIIASAVIVGASALISVLPGISDAFSLITASSSGVYISIYVLIMIAHWK 301 

+ LSRQ VPSRAIIASAV+VG SALI++I,PG++DAFSLITASSSGVYI+IY L MIAHWK 
Sbjct: 340 OTLSRQGVPSRAIIASAVVVGISALINILPGVADAFSLITASSSGVYIAIYaLTMIflHWK 399 

Query: 302 YRKSPDFMEDGYKMPAYKILSPITLLFFLFVFVSLFLQDSTYIGAIGATIWIIGiraLYSH 361 

YR+S DFM DGY MP YK+ +P+TL FF FVF+SLFLQ+STYIGAIGATIWII FG+YS+ 
Sbjct; 400 ■SRQSKDFMftlXSYIMPKYKm'PLTIAFFAFVFISI.FLQESTYIGAIGATI^^ 459 

Query: 362 FKHK 365 
K K 

Sbjct: 460 VKFK 463 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2205 

A DNA sequence (GBSx2324) was identified in S.agalactiae <SEQ ID 6807> which encodes the amino 
acid sequence <SEQ YD 6808>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

»> Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0. 4333 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC95438 GB:AF068901 unknown [Streptococcus pneumoniae] 
Identities = 80/214 (37%) , Positives = 122/214 (56%) , Gaps = 3/214 (1%) 



Query: 4 FFSNIRTEIPQMPLLIHSLILSVLPPLMWLTLVNRDKPLYKTIWSILLGLQLITIYTWFF 63 

FF+ T+ P+ L + + ++L + R+K +Y+ + IL +QLI +Y W++ 

Sbjct: 7 FFTTQATKPPKFDLFWWSLFTLIALTFYTAHRYREKKVVQRFFQILQTVQLILLYGWYW 66 

Query: 64 WAKLPLSESLPLyHCRIGMFVVLLARPGI--LKDyFALLGWGGVLftMIHPDFYPYQFLH 121 

+PLSESLP YHCR+ MFWLL PG K YFALLG G + A ++P Y F H 
Sbjct: 67 VNHMPLSESLPFYHCRMAMFWLLL-PGQSKYKQYFALLGTFGTLAAFVYPVPnAYPFPH 125 

Query: 122 VTNIFFFIGHFALFVLSLLHLMTQSNLDKLNPKLIIQLTLLINMSLIPINLLTGGNYGFM 181 

+T + F GH AL SL++L+ Q N L+ K I +T +N + +NL+TGG+YGF+ 
Sbjct: 126 ITILSFIFGHLALLGNSLVYLLRQYNARLLDVKGIFLMTFALNALIFVVNLVTGGDYGFL 185 

Query: 182 MKTPILGITNPFLNLFIVTTLLSFLVLFVKQIPQ 215 

K P++G N +V+ +L + K+I + 

Sbjct: 186 TKPPLVGDHGLVANYLLVSIVLVATISLTKKILE 219 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6809> which encodes the amino acid 
sequence <SEQ ID 6810>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
>>> Seems to have no N- terminal signal sequence 
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Final Results 

bacterial membrane -- 
bacterial outside -■ 
bacterial cytoplasm -■ 



- Certainty=0. 5501 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:AAC95438 GB:AF068901 unknown [Streptococcus pneumoniae] 
Identities = 90/231 (38%) , Positives = 128/231 (54%) , Gaps = 7/231 (3%) 

Query: 3 FFAIDPIGLPHTSLIFYLSSLLIALLLVFLTFQAYRLKS-HRYFFLFLQLSQVIGLYTWY 61 
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FF P L +Y+S L L L F T YR K . ++ FF LQ . Q+I LY WY 

Sbjct: 7 FFTTQATKPPKPDLFWYVS-LFTLLALTFYTAHRYREKKVYQRFFQILQTVQLILLYGWY 65 

Query: 62 VLRGPPLDEALPLYHCRIAMLAIFFLPDRNKFKQLF.MVLGIGGTFIALL--SPDLYPPRL 119 
5 + PL E+LP YHCR+AM + LP ++K+KQ F +LG GT A + PD YPF 

Sbjct: 66 VraraMPLSESLPFYHCRMAMFVVIjLLPGQSKXKQYFALLGTFGTLAAFVYPVPDA^ 124 

Query: 120 WHVANVSFYFGHYALLVNGLIYLLRFYDASQLRLLSWRYLATVNFLLLLVSLATKOIYG 179 

H+ +SF FGH ALL N L+YLLR Y+A L + + +N L+ +V+L T G+YG 

10 Sbjct: 125 -HITILSFIFGHLALLGNSLVYLLRQYNARLLDVKGIFLMTFALNALIFVWLVTGGDYG 183 

Query: 180 FVMDIPVIHTRHLLLNFVIVTSGLTFMVKITEYFYLKFGEAQQLALAFSKE 230 

F+ P++ L+ N+++V+ L + +T+ L+F AQ+ KE 
Sbjct: 184 FLTKPPLVGDHGLVaNYLLVSIVLVRTISLTKKl-LEFFLAQEAEKMIVKE 233 

15 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 70/216 (32%) , Positives = 117/216 (53%) , Gaps = 1/216 (0%) 

Query: 2 lEFFSNIRTEIPQMPLLIHSLILSVLPFLMWLTLVNRDKPLYKTIWSILLGLQLITIYTW 61 

20 ++FF+ +P L+ + L + L++LT ++ + L Q+I +YTW 

Sbjct: 1 MDFFAIDPIGLPHTSLIFYLSSLLIALLLVFLTFQAYRLKSHRYFFLFLQLSQVIGLYTW 60 

Query: 62 FFWaKLPLSESLPLYHCRIGMFWL-LaRPGILKDYFALLGWGGVLAMIHPDFYPYQPL 120 
+ PL E+LPLYHCKI M + L K F +LG+ G LA++ PD YP++ 

25 Sbjct: 61 YVLR6FPLDEALPLYHCRIAMLAIPFLPDRNKFKQLFMVLGIGGTFLALLSPDLYPFRLW 120 

Query: 121 HVTNIFFPIGHFALFVLSLLHL^^^QSlmDKIJi^PKLIIQLTLLIlmSLIFINLLTGGN^^ 180 

HV N+ F+ GH+AL V L++L+ + +L +++ +N L+ ++L T GNYGF 
Sbjct: 121 HraOTSFYFGHYALLWGLIYLLRFYDASQLRLLSVTOYLATVNFLLLLVSLATKGNYGF 180 

30 

Query: 181 MMKTPILGITNPFLNLFIVTTLLSFLVLFVKQIFQK 216 

+M P++ + LN IVT+ L+P+V + + K 
Sbjct: 181 VMDIPVIHTRHLLLNFVIVTSGLTPMVKITEYPYLK 216 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2206 

A DNA sequence (GBSx2325) was identified in S.agalactiae <SEQ ID 681 1> which encodes the amino 
acid sequence <SEQ ID 6812>. Analysis of this protein sequence reveals the following: 

40 Possible site: 13 

>>> Seems to have no N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3297 (Affirmative) < suco 

45 , bacterial membrane Certaintyi=0.0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

50 Based on this analysis, it was predicted that this protein and its epitopes, could be tisefiil antigens for 
vaccines or diagnostics. 

Example 2207 

A DNA sequence (GBSx2326) was identified in S.agalactiae <SEQ ID 6813> which encodes the amino 
acid sequence <SEQ ID 6814>. This protein is predicted to be oxalate:fonnate antiporter (oxlT-2). Analysis 
55 of this proteia sequence reveals the following: 
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Possible site: 27 

>>> Seems to have a cleavable N-term signal seg. 
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Final Results 

bacterial membrane 

bacterial outside 

bacterial cytoplasm 



CertaintY=0 .4121 (Affirmative) < suco 
Certainty=0 . 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AaF36228 GB:AF168363 oxalate : formate antiporter [Lactococcus 
lactis] 

Identities = 220/398 (55%) , Positives = 306/398 (76%) , Gaps = 3/398 (0%) 

Query: 5 NRYWAVSGWLHLMLGSTYAWSVFRNPIISETGWDISSVSFAFSLAIFCLGMSAAFMGH 64 

NRYWR +GV+ HLM+GS YAWSVF NPI + GW SSV+ AFS+AI+ LGMSAAFMG 
Sbjct: 4 NRWVAFAGVMFHLMIGSVYAWSVFTNPIAKQNGWAESSVAIAPSIAIYFLGMSAAFMGK 63 

Query: 65 LVERFGPRIMGMISAILYGAGNVLTGLAIETQQLWLLYVAYGILGGIGLGSGYITPVSTI 124 

+VE+ GPR+ G I++ LYG G ++TG AI +WLLY++yG++GG+GLG+GY+TPVSTI 
Sbjct: 64 WEKIGPRLTGTIASPLYGTGTIMTGWAIHQNSIWLLYLSYGVIGGLGLGAGYVTPVSTI 123 

Query: 125 IKWFPDRRGLATGFAIMGFGFASLVTSPLAQSLMIRIGVGKTFYILGLVYPFVMMIASQF 184 

IKWFPD+R6LATG A1MGFGFA+++T P+AQ LM +G+ +TFY+LG YF +M++A+QF 
Sbjct: 124 IKWFPDKRGLATGLAIMGFGFAAMLTGPVAQQLMASVGLEQTFYLLGTFYPVIMLLAAQF 183 

Query: 185 IKQPPQEKITILTHpGKKNAMNSQIITG--LKANAAlKSKTFYIIWLTLFlNISCGLGLI 242 ' 

I + P ++ T + +++ G L AN A+K+K+F +W+ FINI+CG+GL+ 

Sbjct: 184 IVR-PNLALSSTTENSISQKHGTRLTRGPELTANQALKTKSFTFLWIMFFINITCGIGLV 242 

Query: 243 SAASPMAQDLAGYSAESAALLVGVLGIENGFGRLLWASLSDYIGRPLTFIILFIVNFIMT 302 

SAASPMAQ + G S ++AA++VG++G+ENGFGRL+WA+LSDYIGRP TF +FI++ +M 
Sbjct: 243 SAASPMAQSMTCMSVQTAAIMVGIIGLFNGFGRLIWATLSDYIGRPATFSAIFILDIVML 302 

Query: 303 SSLFLSFNAIVFAIAMSILMTCYGAGFSLLPAYLSDIFGTKELATLHGYSLTAWAIAGLF 362 

S++ + ++F IA+ +LM+CYGAGFS++PAYL D+FGTKEL +HGy LTAWA AG+ 
Sbjct: 303 SAILIFKLPLLFVIALCLLMSCYGAGFSVIPAYLGDVFGTKELGAVHGYVLTAWAAAGW 362 

Query: 363 GPLLLSKTYSWGNSYQLTLMVFGFLFLFGLLLSLYLRK 400 

GPLLLS T+ ++Y LTL F + L LL+S ++++ 
Sbjct: 363 GPLLLSLTHQLFHNYTLTLAAFILIDLLALLISFWIQR 400 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6815> which encodes the amino acid 
sequence <SEQ ID 6816>. Analysis of this protein sequence reveals the following: 

Possible site: 27 
»> Seems to have a cleavable N-term signal seq. 
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Pinal Results 

bacterial membrane Certainty=0 . 6180 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF36228 GB:AF168363 oxalate ; formate antiporter [Lactococcus 
lactis] 

10 Identities = 222/399 (55%) , Positives = 305/399 (75%) , Gaps = 3/399 (0%) 

Query: 3 KTKRYIIATAGILmLMLGSTyAWSVYRNPILQETGTOQAPVaFAPSLAIFCLGLSAAFM 62 

KT RY++A AG++ HLM+GS YAWSV+ NPI ++ GW ++ VA AFS+AI+ LG+SAAFM 
Sbjct: 2 KTNRYWAFAGVMFHLMIGSVYAWSVFTNPIAKQNGWAESSVALAFSIAIYFLGMSAAFM 61 

15 

Query: 63 GNLVEQYGPRLTGTVSAILYASGNMLTGLAIDRKEIWLLYIGYGVIGGLGLGAGYITPIS 122 

G +VE+ GPRLTGT+++ LY +G ++TG AI + IWI1LY+ YGVIGGLGLGAGY+TP+S 
Sbjct: 62 GKVVEKIGPRLTGTIASFLYGTGTIMTGWaiHQKSIWLLYLSYGVIGGLGLGAGYVTPVS 121 

20 Query: 123 TIIKWFPDKRGP5ATGPAIMGFGEASLLTSPIAQWLIETEGLVATFYLLGLIYLIVMLFAS 182 

TIIKWFPDKRG+ATG AIMGFGFA++LT P+AQ L+ + GL TFYLLG Y ++ML A+ 
Sbjct: 122 TIIKWFPDKRGLATGLAIMGFGFAAMLTGPVAQQLMASVGLEQTFYLLGTFYFVIMLLAA 181 

Query: 183 QLIIKPTAAEIAII1DKKRLQ-HNSYLIEG--MTAKEALKTKSFYCLWVILFINITCGLGL 239 
25 Q I++P A + + Q + L G +TA +ALRrKSF LW++ FINITCG+GL 

Sbjct: 182 QFIVRPNLALSSTTENSISQKKGTRLTRGPELTANQALKTKSFTFLWIMFFINITCGIGL 241 

Query: 240 ISWAPMAQDLTGMSPEMSAIWBftMGIENGFGRLVWASLSDYIGRRVTVILLFLVSIIM 299 

+S +PMAQ +TGMS + +AI+VG +G+FNGFGRL+WA+LSDYIGR T +F++ I+M 
30 Sbjct: 242 VSAASPMAQSMTGMSVQTAAIMVGIIGLFNGFGRLIWATLSDYIGRPATFSAIFILDIVM 301 

Query: 300 TISXjIFAHSSLIFMISIATLMTCYGAGFSLIPPYLSDLFGAKELATIiHGYILTAWAIAAL 359 

+++ L+F+I++ LM+CYGAGFS+IP YL D+PG KEL +HGY+LTAWA A + 

Sbjct: 302 LSAILIFKLPIiFVIALCLLMSCYGRGFSVIPAYLGDVFGTKELGAVHGYVLTAWAAAGV 361 

35 

Query: 360 TGPMLLSITVEWTHNYLLTLCVFIVLYILGLMVALRLKK 398 

GP+LLS+T + HNY LTL FI++ +L L+++ +++ 
Sbjct: 362 VGPLLLSLTHQLFHNYTLTLAAFILIDLLALLISFWIQR 400 

40 An alignment of the GAS and GBS proteins is shown below. 

Identities = 252/400 (63%) , Positives = 329/400 (82%) , Gaps = 2/400 (0%) 

Query: 1 MK^^LmYVVAVSGVVLHLMLGSTYAWSVFRNPIISETGWDISSVSFAFSLAIFCLGMSAA 60 
M+ RY++A +G++LHLMLGSTYAWSV+RNPI+ ETGWD + V+FAFSLAIFCLG+SAA 
45 Sbjct: 1 MEKTKRYIIATAGILlJfflliMI^SlYAWSVYimPILQETGWDQAPVAFAFSIAIFCIXSLSAA 60 

Query: 61 FMGHLVERFGPRIMGMISAILYGAGNVLTGLAIETQQLWLLYVAYGILGGIGLGSGYITP 120 

FMG+LVE++GPR+ G +SAILY +GN+LTGrAH- +++WI1LY+ YG++GG+GLG+GYITP 
Sbjct: 61 F^KSKCiVEQYGPRLTGTVSAILYASGNMLTGLAIDRKEIWLLYIGYGVIGGLGLGRGYITP 120 

50 

Query: 121 VSTIIKWFPDRRGLATGFAIMGFGFASLVTSPLAQSLMIRIGVGKTFYILGLVYFFVMMI 180 

+STIIKWFPD+RG+ATGFAIMGFGFASL+TSP+AQ I.+ G+ TFY+LGL+Y VM+ 
Sbjct: 121 ISTIIKWFPDKRGMATGFAIMGFGFASLLTSPIAQWLIETEGIiVATFYLIjGLIYIiIVMLF 180 

55 Query: 181 ASQFIKQPPQEKITILTHDGKKMAMNSQIITGLKANAAIKSKTFYIIWLTIiFINISCGLG 240 

ASQ I +P +1 IL D K+ NS +1 G+ A A+K+K+FY +W+ LFINI+CGLG 
Sbjct: 181 ASQLIIKPTAaEIAIL--DKKRLQNNSYLIEGMTAKEALKTKSFYCLWVILFINITCGLG 238 

Query: 241 LISAASPMAQDLAGYSAESAALLVGVLGIFNGFGRLLWASLSDYIGRPLTFIILFIVNFI 300 
60 LIS +PMAQDri G S E +A++VG +GIFNGFGRL+WASLSDYIGR +T I+LF+V+ I 

Sbjct: 239 LISWAPMAQDLTGMSPEMSAIWGAMGIPNGFGRLVWASLSDYIGRRVTVILLPLVSII 298 

Query: 301 MTSSLPLSFHAIVFAIAMSILMTCYGAGPSLLPAYLSDIFGTKELATLHGYSLTAWAIAG 360 
MT SL + ++++F LMTCYGAGFSL+P YLSD+FG KEIiATLHGY LTAWAIA 

65 Sbjct: 299 MTISLIPAHSSLIFMISIATLMTCYGAGFSLIPPYLSDLFGAKELATLHGYILTAWAIAA 358 
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Query: 361 LFGPLLLSKTYSWGNSYQLTLMVFGFLFLFGLLLSLYLRK 400 

L GP+LLS T W ++Y LTL VF' L++ Gli+H-tL L+K 
Sbjct: 359 LTGPMLLSITVEWTHJmiLTLCVFlVLYILGLMVRLRLKK 398 

A related GBS gene <SEQ ID 8995> and protein <SEQ ID 8996> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
McG: Discrim Score: 5.06 
GvH: Signal Score (-7.5): 4.38 

Possible site: 27 
»> Seems to have a cleavable N-term signal seq. 
ALOM program count: 10 value: -7.80 threshold: 0.0 
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modified ALOM score: 2.06 



*** Reasoning Step:- 3 

Final Results 

bacterial membrane — Certainty=0. 4121 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF02272{313 - 1500 of 1818) 

GP|7107009|gb|AAF36228.l|AF168363_4|AF168363 (4 - 400 of 421) oxalate : formate antiporter 
{Lactococcus lactis} 
%Match =38.5 

%Identity =55.4 %Similarity =79.1 

Matches = 220 Mismatches = 81 Conservative Sub.s = 94 

216 246 276 306 336 366 396 426 

GK*IC*AENW*YlQFFDNLFITNYIFKNKT*TOP*EDCLKNXiNRYWAVSGVVLHLMLGSTYAWSVFRNPIISETGTO 



MKTNRYWAFAGVMFHLMIGSVYAWSVFTNPIAKQNGWAES 
10 20 30 40 




456 486 516 546 576 606 636 666 

SVSFAFSLAIFCLGMSAAFMGHLVERFGPRIMGMISAILYGAGNVLTGLAIETQQLWLLYVAYGILGGIGLGSGYITPVS 



SVALAFSIAIYFLGMSAAFMGKWVEKIGPRLTGTIASFLYGTGTIMTGWAIHQNSIWLLYLSYGVIGGLGLGAGYVTPVS 
60 70 80 90 100 110 120 



696 726 756 786 '816 846 876 906 

TIIKWFPDRRGLATGFAIMGFGFASLVTSPLAQSLMIRIGVGKTFYILGLVYFFVMMIASQFIKQPPQEKITILTHD6KK 



TIIKWFPDKRGLATGLAIMGFGFAAMLTGPVAQQLMASVGLEQTFYLLGTFYFVIMLLAAQFIVRP-NLALSSTTENSIS 
140 150 160 170 180 190 200 




936 960 990 1020 1050 1080 1110 1140 

NAMNSQIITG--LKANAAIKSKTFYIIlttiTLFINISCGLGLISAASPMRQDLAGYSaESAAIiVGVLGIENGFGRLLWAS 



QKKGTRLTRGPELTANQALKTKSFTFLWIMFFINITCGIGLVSAASPMAQSMTGMSVQTAAIMVGIIGLFNGFGRLIWAT 
210 220 230 240 250 260 270 280 



1170 1200 1230 1260 1290 1320 1350 1380 
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LSDYIGRPLTFIILFIVNFIMTSSLFLSFNAIVFAIAMSILMTCYGAGFSLLPAYLSDIFGTKELATLHGYSLTAWAIAG 

llllllll II :||:: =1 I-- = = 1 11= = I h 1 I I 1 II I = = I I I I hllllll =111 Hill 11 

LSDYIGRPATFSAIFILDIWLSAILIFKIiPLLFVIALCLLMSCTGAGFSVIPAYLGDOTGTKELGAVHGYVLTAWAAAG 
290 300 310 320 330 340 350 360 

1410 1440 1470 1500 1530 1560 1590 1620 

LFGPLLLSKTYSWGNSYQLTL^WFGFLFLFGLLLSLYLRKLTTKVV*YISNLKFFGFTKEFFL*KIVLSYSK*FDILSI* 



WGPLLLSLTHQLFHNYTLTLAAFILIDLLALLISFWIQRDFIKASKLIKKQIIKNYFKAH 
10 370 380 390 400 410 420 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2208 

15 A DNA sequence (GBSx2327) was identified in S.agalactiae <SEQ ID 6817> which encodes the amino 
acid sequence <SEQ ID 6818>. This protein is predicted to be D-Ala-D-Ala adding enzyme (murF). 
Analysis of this protein sequence reveals the following: 



Possible site: 45 

»> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1311 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certaintyi=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9739> which encodes amino acid sequence <SEQ ID 9740> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC95436 6B:AF068901 D-Ala-D-Ala adding enzyme [Streptococcus pneumoniae] 
30 Identities = 313/453 (69%) , Positives = 375/453 (82%) 



Query: 


32 


Sbjct: 


1 


Query: 


92 


Sb j ct : 


61 


Query: 


152 


Sb j ct : 


121 


Query: 


212 


Sbjct: 


181 


Queiy: 


272 


Sbjct: 


241 


Query: 


332 


Sb j ct : 


301 


Query: 


392 


Sb j ct : 


361 



MKL++HE+A+WGAKN +S FED L EFDSR I GDLF+PLKGARD6H+FIE AF+ 
MKLTIHEIAQVVGAKNDISIFEDTQIiEKAEFDSRLIGTGDLFVPLKGaRDGHDFIETAFE 60 



NGA T+SEKE+ HPY+LV D L AFQ LA YY+EK VDV AVTGSNGRTTTKDM+A 



+LST YKTYKTQGNYIWIEIGLPYTVLHMPE TEK++LEMGQDHLGDIH+LSE+A+P+ A+ 



45 VTL+6EAHL FF R +IA+GKMQI DGM+S +L+AP DPI++ YLP ++ +RFG 



EL++T+L E K SLTFK N LE L +PV GKYNATNaM+A+YV V+EE I A + 



+Iri-LTRmTEWKK+ANGADILS0VYNaHPTAM+LILETFSAIP N+GGKKIA+LADMKEL 



G QSV LHNQMI+S+ PD +DT+I YG+DI LAQLASQMFPIG VY+FKK ++ DQF+ 
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Query: 452 LLAKVKDTLKEKDQILLKGSNSMNLSKIVDILE 484 

L+ +VK++L DQILLKGSNSMNL+ +V+ LE 
Sbjct: 421 LVKQVKESLSANDQILLKGSNSMNLaMLVESLE 453 

5 A related DNA sequence was identified in S.pyogenes <SEQ ID 681 9> which encodes the amino acid 
sequence <SEQ ID 6820>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 .3299 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

15 An alignment of the GAS and GBS proteins is shown below. 

Identities = 323/452 (71%) , Positives = 387/452 (85%) 

Query: 32 MKLSLHE\ffiKra3AKNQVSEFEDVPIfiNIEFDSRNISEGDLPLPLKBAia3GHEFIE 91 
MKL+LHEVAK+V A+N VS+ +DVPL +IEFDSR I++GDLFLPLKiG RDGHEFI++AF 
20 Sbjct: 1 MKliTLHEVRKIVDftaNNVSDLDDVPLHHIEFDSRKITKGDLFLPLKGQRDGHEPIDLAFQ 60 

Query: 92 NGAIATISEKEIEGHPYLLVSDALKAFQVLAQYYIEKMNVDVIAVTGSNGKTTTKDMIAA 151 

NGA+AT SEKE+ G P+LLV D LKAFQ LA yyi+KM VDVIAVTGSNGRT+TKDMI A 
Sbjct: 61 NGAVATFSEKELPGKPHLLVEDCLKAFQKLAHYYIDKMRVDVIAVTGSNGKTSTKDMIGA 120 

25 

Query: 152 ILSTTYKTYKTQGNraNEIGLPYTVXjmPEDTEKIILEMGQDHLGDIHVIiSEIAKPRm^ 211 

+LSTTYKTYKTQGNYNNEIGLPYTVLHMP+DTEKI+LEMGQDH+GDI +LSEIA+PRIAV 
Sbjct: 121 VLSTTYKTYKTQGNVNNEIGLPYTVLHMPDDTEKIVLEMGQDHMGDIRLLSEIARPRIAV 180 

30 Query: 212 VTLIGEAHLEFFGSREKIAEGKMQITDGMSSDGILIAPGDPIIDPYLPANQMTIRFGHDQ 271 

+TL+GEAHLE+FGSR+KIA+GKMQI DGM+SDGILIAPGDPIIDPYIiP NQM IRFG+ Q 
Sbjct: 181 LTLVGEAHLEYFGSRDKIAQGKMQIVDGMNSDGILIAPGDPIIDPYLPENQMVIRFGNQQ 240 

Query: 272 EI<2VTELKEEKHSLTFK!]mLEHQIiRIPVPGKXNATOaWAAOTGKLI^^ 331 
35 E+ VT ++E+K SLTF TN L + +P+P6KYNATNAMVAAYVGKriLAV +EDI+ AL+ 

Sbjct: 241 EIDVTGIQEDKDSLTFTTNVIATPVSLPLPGKYNATiaAMVaAyVGKLLAVTDEDIIAaLQ 300 

Query: 332 NLQLTRNRTEWKKSANGADILSDVYNANPTAMRLILETFSAIPNNDGGKKIALLADMKEL 391 
+ LT NRTEWKK+ANGADILSDVYMANPTAMRLILETF+ I N GGKKIA+LADMKEL 
40 Sbjct: 301 TVTLTGNRTEWKKAANGADILSDVYNANPTAMRLILETFANIAKNPGGKKIAVLADMKEL 360 

Query: 392 GEQSVDLHNQMIMSIRPDSIDTLICTGQDIEGLAQLASQMFPIGKVYFFKKNQEVDQFDQ 451 

G+ SV LH+Q+I S+ +ID L+ YG 1+ LA+LASQ++P +V++F K ++ DQF+ 
Sbjct: 361 GKDSVILHSQLIDSLTSGNIDQLVFYGDHIKELARLASQVYPAEQVHYFLKTEQEDQFEA 420 



45 

Query: 452 LLAKVKDTLKEKDQILLKGSNSMNLSKIVDIL 483 

+ V++ L DQILLKGS+SM+L K+VD L 
Sbjct: 421 MAQYVQNIUSrPFDQILLKGSHSMSLEKLVDRL 452 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2209 

A DNA sequence (GBSx2328) was identified in S.agalactiae <SEQ ID 682 1> which encodes the amino 
acid sequence <SEQ ID 6822>. Analysis of this protein sequence reveals the following: 

55 Possible site: 17 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1381 (Affirmative) < suco 

60 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC95435 GB:AP068901 D-Ala-D-Ala ligase [StreptocooGus pneumoniae] 
Identities = 243/346 (70%) , Positives = 289/346 (83%) 

Query: 3 KETLILLYGGRSAEREVSVLSAESVMRAINYDKFFVKTYFITQVGQFIKTQEFDEMPSSD 62 

K+T+ILLYGGRSAEREVSVLSAESVMRA+NYD+F VKT+FI+Q G FIKTQEF P + 
Sbjct: 2 KQTIILLYGGRSAEREVSVLSAESVMRAVNYDRFTVKTFFISQSGDFIKTQEFSHAPGQE 61 

Query: 63 EKL^™QTVDLDKMTOPSDIYD]mIWPVIlHGPMGEDGSIQGF]:JEVLRMPYTO 122 

++LMTN+T+D DK V PS IY++ A+VFPVLHGPMGEDGS+QGFLEVL+MPYVG NILSS 
Sbjct: 62 DRimrmTIDTOKKVAPSAIYEEGAWFPVLHGPMGEDGSVQGFLEVLKMPYVGCNILSS 121 

Query: 123 SVAMDKITTKQVLATVGVPQVAYQTYFEGDDLEHAIKLSLETLSPPIFVKPANMGSSVGI 182 

S+AMDKITTK+VL + G+ QV Y EGDD+ I E L++P+F KP+NMGSSVGI 
Sbjct: 122 SLAMJKITTKRVLESMIAQVPYVAIVEGDDVTAKIAEVEEKLAYPVFTKPSNMGSSVGI 181 

Query: 183 SKATDESSLRSAIDBALKYDSRILIEQGVTAREIEVGILGNNDVKTTFPGEVVKDVDFYD 242 

SK+ ++ LR A+ LA +YDSR+L+EQGV AREIEVG+LGN DVK+T PGEWKDV FYD 
Sbjct: 182 SKSENQEELRQALKLAFRYDSRVLVEQGVNAREIEVGLLGNYDVKSTLPGEWKDVAFYD 241 

Query: 243 YDAKYIDNKITMDIPAKVDEATMEAMRQYASKAFKAIGACGLSRCDFFLTKDGQIFLNEL 302 

YDAKYIDNKITMDIPAK+ + + MRQ A AF+AIG GLSRCDFF T G+IFLNEL 
Sbjct: 242 YDAKYIDNKITMDIPAKISDDVVAVIWQNAETAPRAIGGLGLSRCEFFYTDKGEIFLNEL 301 

Query: 303 NTMPGFTQWSMYPLLWEISIMGLTYSDLIEKLVMLAKEMFEKRESHLI 348 

NTMP6FTQWSMYPLLW+NMG++Y +I1IE+LV lAKE F+KRE+HLI 
Sbjct: 302 NTMPGFTQWSMYPIiLWDlilMGISYPELIERLVDLAKESFDKREAHI.1 347 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4559> which encodes the amino 
sequence <SEQ ID 4560>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1451 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 261/348 (75%) , Positives = 306/348 (87%) 

(Juery: 1 MSKETLILLYGGRSAEREVSVLSAESVMRAINYDKFFVKTYFITQVGQFIKTQEFDEMPS 60 

MSK+TL+LLYGGRSAEREVSVLSAESVMRA+NYDKF VKTYFITQ+GQFIKTQ+F E PS 
Sbjct: 1 MSKQTLVLLYGGRSAEREVSVLSAESVMRAVim»KFLVKTYFITQMGQFIKTQQFSEKPS 60 

Query: 61 SDEKLMTNQTVDLDKMVRPSDIYDDNAIVFPVLHGPMGEDGSIQGFLEVLRMPYVGTNIL 120 

E+LMTN+T++L + ++PSDIY++ A+VFPVLHGPMGEDGSIQGFLEVLRMPY+GTN++ 
Sbjct: 61 ESERLMTNETIELTQKIKPSDIYEEGAWFPVLHGPMGEDGSIQGFLEVLRMPYIGTNVM 120 

Query: 121 SSSVAMDKITTKQV1ATOGVPQVR.YQTYFEGDDLEHAIKLSLETLSFPIFVKPA1S1MGSSV 180 

SSS+AMDKITTK+VL ++G+PQiWAY Y +G DLE + +L L+FPIFVKPANMGSSV 
Sbjct: 121 SSSIAM)K3:TTKRVIBSIGIPQVRYTVYIDGQDLEACLVETIARLTFPIFVKPA1S1MGSSV 180 

Query: 181 GISKATDESSLRSAIDLALKYDSRILIEQGVTAREIEVGILGNNDVKTTFPGEWKDVDF 240 

GISKA + LR AI LAL YDSR+LIEQGV AREIEVG+LGN+ VK+T PGEV+KDVDF 
Sbjct: 181 GISKAQTKVELRKAIQLaLTYDSRVLIEQGWAREIEVGLLGNDiCVKSTLPGEVIKDVDF 240 

Query: 241 YDYDAKYIDNKIT^mIPAKOTEAT^IEftMRQYASKAPKAIGACGLSRCDFFLTKI)GQIFLN 300 

YDY AKY+DNKITM IPA VD++ + MR YA AFKA+G CGLSRCDFFLT+DGQ++LN 
Sbjct: 241 YDYQAKYVDNKimaiPADVDQSIVTEMRSYAEVAFKALGGCGLSRCDFFLTQDGQV^ 300 

Query: 301 ELWIMPGFTQWSMYPLLWEmGLTySDLIEKLVMLAKEMFEKRESHLI 348 
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ELNTMPGFTQWSMYPLLWENMGL Y DLIE+LV LA+EMF++RESHLI 
Sbjct: 301 EUSrrMPGFTQWSMYPLLWENMGLAYPDLlEELVTLAQEMFDQRESHLI 348 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2210 

A DNA sequence (GBSx2329) was identified in S.agalactiae <SEQ ID 6823> which encodes the amino 
acid sequence <SEQ ID 6824>. This protein is predicted to be recombination protein (recR). Analysis of 
this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2540 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC44615 GB:U58210 RecM [Streptococcus thermophilus] 
Identities = 181/198 (91%) , Positives = 189/198 (95%) 

Query: 1 MLYPTPIAKLIDSFSKIPGIGTKTATRLAFYTIGMSDEDVNEFAKNLLRAKRELTYCSVC 60 

MLYPTPIAKLIDSFSKLPGIG KTATRLAFYTI MSDEDVN+FAKNLLAAKRELTYCSVC 
Sbjct: 1 MLYPTPIAKLIDSFSKLPGIGAKTATRLAFYTISMSDEDVNDFAKNIiIAAKRELTYCSVC 60 

Query. 61 GNLTDDDPCLICTDKTKDQSVILWEDSKDVSftMEKlQEYNGLYHVLHGLISPMNGISPD 120 

G LTDDDPC+ICTD+TRD++ ILWEDSKDVSAMEKIQEY GLYHVL GLISPMNG+ PD 
Sbjct: 61 GRLTDDDPCIICTDETRDRTKILWEDSKDVSfiMEKIQEYRGLYHVLQGLISPMNGVGPD 120 

Query: 121 DINLKSLITRLMDGQVTEVIVATNATftlXSEATSMYISRVLBaPAeiKViraj^ 180 

DINLKSLITRLMD +V EVI+ATNATADGEATSMYISRVLKPAGIKVTRLARGLAVGSDI 
Sbjct: 121 DINLKSLITRLMDSEVDEVimTNATADGEATSMYISRVLKPAGIKVTRLaRGLAVGSDI 180 

Query: 181 EYADEVTLLRAIENRTEL 198 

EYADEVTLLRAIENRTEL 
Sbjct: 181 EYADEVTLLRAIENRTEL 198 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6825> which encodes the amino acid 
sequence <SEQ ID 6826>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2652 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=o. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 180/198 (90%) , Positives = 192/198 (96%) 

Query: 1 MLYPTPIAKLIDSFSKLPGIGTKTATRLAFYTIGMSDEDVlTOFAKNIJ^ftAKRELTYCSVC 60 

+LYPTPIAKLIDS+SKLPGIG KTATRLAFYTIGMS+EDW+FAKNLLAAKRELTYCS+C 
Sbjct: 1 VLYPTPIAKLIDSYSKLPGIGIKTATRLAFYTIGMSNEDVNDFAKNLLAAKRELTYCSIC 60 

Query: 61 GNLTDDDPCLICTDKTRDQSVILVVEDSKDVSAMEKIQEYNGLYHVLHGLISPMNGISPD 120 

GNLTDDDPC ICTD +RDQ+ ILWED+iq)VS3iMEKIQEY+G YHVLHGLISPMNG+ PD 
Sbjct: 61 GNLTDDDPCHICTDTSRDQTTILWEaSAKDVSaMEKIQEYHGYYHVLHGLISPMNGVGPD 120 
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Query: 121 DlISn:,KSLITRLMDGQVTEVIVATNATADGEATSMyiSRVLKPAGIKVTRIiaRGLAVGSDI 180 

DINLKSLITRLMDG+V+EVIVATNATADGEATSMYISRVLKPAGIKVTRLARGLAVGSDI 
Sbjct: 121 DINLKSLITRLMDGKySEVIVATmTADGEATSMYISRVLKPAGIKVTRLARGLAVGSDI 180 

Query: 181 EYADEVTLLRAIENRTEL 198 

EYADEVTLIiRAIENRTEL 
Sbjct: 181 EYADEVTLLRAIENRTEL 198 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2211 

A DNA sequence (GBSx2330) was identified in S.agalactiae <SEQ ID 6827> which encodes the amino 
acid sequence <SEQ ID 6828>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>» Seems to have no N- terminal signal sequence 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2212 

A DNA sequence (GBSx2331) was identified in S.agalactiae <SEQ ID 6829> which encodes the amino 
acid sequence <SEQ ID 6830>. This protein is predicted to be penicillin-binding protein 2b. Analysis of this 
protein sequence reveals the following: 

Possible site: 52 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-13.69 Transmembrane 23 - 39 ( 17 - 46) 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC44614 GB:U5B210 penicillin-binding protein 2b [Streptococcus thermophilus] 
Identities = 341/683 (49%) Positives = 477/683 (68%) , Gaps = 12/683 (1%) 

Query: 4 RKKRYRLT^^CKQNASIPRRUSILLFFIIVLLFTVLILRLEQMQIGQQSFYM1^ 63. 

++K R ++ +1 RR+ LLF ++ +LF +L RL MQ+ +SFY KKL + YT 
Sbjct: 18 KRKEKRANKPRKPVNISRRVYLLFGWFVLFLLLFARLTYMQVXNKSFYTKKLEDNSKOT 77 

Query: 64 VKESKARGQIFDAKGVVLVENDERPTVAFSRGNKISSQSIKELANKLSHYITLTEVASSD 123 

V+ + RGQIFDAKG+ L N + + F+R N +SS ++K +A +L+ +TLTE +D 
Sbjct: 78 VRIASERGQIFDAKGIALTTNQSKDVITFTRSNLVSSDTMKSVAERLATLVTLTETKVTD 137 

Query: 124 RAKRDYYIADKftNYKKVVESLPDSKRYDKFG^raLAESTVYA^lAVAAVPVSAINYSEDELK 183 

R KR++YLaD ANYK+W LP+ K+ DKFGN LAE+T+Y NA+ AVP A++YSEDELK 
Sbjct: 138 RQKREFYLADSANYKRVViroLPSroKKTDKFGNKLAEATIYMNAINAVPDEAVDYSEDELK 197 



Final Results 



bacterial cytoplasm — Certainty=0. 3144 (Affirmative) < suco 
bacterial membrane — Certaintyi=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0 . 6477 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 
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Query: 184 WALFNQMNRTPTFGSVKLSTGELSDDQIKKLDADKKELI^ISVTSNWHR^^ 243 

+V +++ MNA F +V L T +L+ DQI + A +KEL GI V +W R +SLS + 
Sbjct: 198 IWIYSH^raAVSNFSTVILKTADLTPDQIAIVAAKQKELNGIRVAKDWERHTSDSSLSPL 257 

Query: 244 LGTISTEKAGLPREEVKKYLKKGYSIilSnDRVGTSYLEKQYEDDLQGIRQIRKVVVNKKGKV 303 

+G +S+ +AGLP+E+ K YLKRGY+LNDRVGTSYI1EK+YE++I1QG +R++ V+K+GKV 
Sbjct: 258 IGRVSSSiUMSLPQEDAKDYLKKGYAIjMJRVGTSYLEKEYEEEIiQGKHITOEITVDKEGKV 317 



Query: 304 VSDNITQEGKSGRNLKLTIDIiNYQNKVESILKQYYGSELSSGRASFSEGMYAVAIEPSTG 363 

SD I Q+G G NLKLTIDL++Q VE IL Q SE+S +A++SEGMYAV + TG 
Sbjct: 318 DSDKIIQKGSKGImlKLTIDLDFQKGVEDILGQQLSSEISGI^CATYSEG^WAVVM^^ 377 

Query: 364 KVLaMftGLKiroHG--OTiVDDSI<3TIAKNFTPGSVVKGATLSSGWEm^ 421 

VLAMAG K++ G + D+L6TI FTPGSVVKGATL++6W + +■ G++VL DQ I 
Sbjct; 378 AVLamGQKHEQGaQDFKADALGTITDVFTPGSVVKGaTLTAGPJRSGaiYGDQVLTDQPI 437 



Query: 422 ANIRSWFT-RGLTPISAAQALEYSSNTYMVQVALRLMGQDYNTGDALTDRGYQEA 475 

I SWFT +G I+A QAI1EYSSNTYMVQ+A++ +GQ Y G +L+ ++A 
Sbjct: 438 NIASSPPITSWFTDKGSRAITATQALEYSSNTYMVQIAIKRLGQQYVPGMSLSTDNMEKA 497 

Query: 476 MAKLRKTYGEYGLGVSTGLDLP-ESEGYVPGKYSLGTTLMESFGQYDAYTPMQLGQYIST 534 

M LR TY E+G+GVSTGLDLP ESEGY+P Y++ L E+FGQYD+YT +QL QY+++ 
Sbjct: 498 OTTLRDTYAEFGMGVSTGLDLPGESEGYIPKimiVAlWLTEAFGQYDSYTTIQLAQYVM 557 

Query: 535 IANNGNRIAPHWSDIYEGNDSNKFAQXjTOSITPKTLNKIAISDQELAIIQEGFYNVVNS 594 

IAN G R+APH+V IY+ + L ++ + LNK+++ ++L IIQ+GF++VVNS 

Sbjct: 558 IANGGKRVRPHIVGGIYDAGKNGSLGTLSSTVDTRVLNKLSLDSKQLGIIQQGFHDVVNS 617 

Query: 595 GSGYATGTSMRQim'TISGKTGTAETFAKimrGQTVSTYiraSlAIAYDT^ 651 

GS ATG +M ++ ISGKTGTAET+A + +G +V+T NtNA+AY T + K+AV +M 
Sbjct: 618 GSSLATGKZU«ASSIIPISGKTGTAETYATIX3SGNSVTTVNimVAYATAKDGTK]A^ 677 

Query: 652 YPHVTTDTTKSHQLVARDMIDQY 674 

YPH +K+HQ + +++ Y 

Sbjct: 678 YPHALDWKSKftHQNAVKAIMELY 700 



A related GBS gene <SEQ ID 8997> and protein <SEQ ID 8998> were also identified. Analysis of this 
protein sequence reyeals the following: 

Lipop Possible site: -1 Crend: 8 
McG: Discrim Score: -12.38 
GvH: Signal Score (-7.5): -5.9 

Possible site: 35 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -12.42 threshold: 0.0 

INTEGRAL Likelihood =-12.42 Transmembrane 23 - 39 ( 18 - 46) 
PERIPHERAL Likelihood =4.56 355 
modified ALOM score: 2.98 

*** Reasoning Step: 3 



Final Results 

bacterial membrane -• 
bacterial outside -■ 
bacterial cytoplasm -■ 



- Certainty=0 . 5967 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certaintyi=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

50.5/71.3% over 683aa 

Streptococcus 

thermophilus 

GP| 1685112 I penicillin-binding protein 2b Insert characterized 
ORF02276(307 - 2322 of 2643) 

GP|l685112|gb|AAC44614.l| ]U58210(17 - 700 of 704) penicillin-binding protein 2b 
{streptococcus thermophilus} 
%Match =38.5 
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%Identity = 50.4 %Siinilarity =71.2 

Matcdies = 342 Mismatches = 189 Conservative Sub.s = 141 

108 138 168 198 228 258 288 318 

5 mGR*NS*LPTTCFRI**KIKPCFRIIiLR*II*SLYKKFRPS»aiEFFIi™iLSVCKKPFL*YNSSQSFYSKELMLlTOK^ 

MTSFWEKNSQKWKKWRQKRK 
10 20 

10 348 378 408 438 468 498 528 558 

RYRLTVKKQNAS IPRRLNLLFFl IXLLFTVLILRLEQMQIGQQSFYMKKLTALTSYTVKESKARGQI FDAKGWLVENDE 

I == =1 1|: 111 =11 =1 II 11= =111 III = 111= = lllllllll: I I 

EKRftNKPRKPVNISRRVYLLFGVVFVI.FLLLFARLTYMQVYNKSFYTKKLEDNS^ 

30 40 50 60 70 80 90 100 

15 

588 618 648 678 708 738 768 798 

RPTVRFSRGNNISSQSIKELANKlSHYITLTEVZVSSDRAKRDYYLftDKaNYKKVVESLPDSK^ 

: : hi I :|l -I =1 =h =1111 =11 lllhll lb h lllll llhhl II 

KDVITFTRSNLVSSDTMKSVAERLATLOTLTETKVTDRQKREFYLftDSaNYKRVViroLPNDKKTD^^ 
20 110 120 130 140 150 160 170 180 



828 858 888 918 948 978 1008 1038 

VAAVPVSAINYSEDELKVVMiFNQMATPTFGSVKLSTGELSDDQIKKLIJADKKELLGISVTSNWH^ 

= III |: = llllllhl : = ::|ll . | :|| | :|: III = I :||l II I =11 =111 = = 1 

INaVPDEAVDYSEDELKIVYIYSHMNAVSNFSTVILKTaDLTPDQIAIVAAKQKELNGIRVAKDtiTERHTSDSSLSPLIGR 
190 200 210 220 230 240 250 260 



1068 1098 1128 1158 1188 1218 1248 1278 

ISTEKAGLPREEVKKYLKKGYSLITORVGTSYriEKQYEDDLQGIRQIRKOTWKKGKWSDNITQEGKSGRmKLTIDIOT 

=|: =lll|:|= I lllll|:|llllllllll|:||::|ll :|:= 1=1=111 II I 1=1 I 11111111== 
VSSSEftGLPQEnAKDYLKKGYAIM3RTOTSYI^KEYEEELQGKHTVREITVT)KEGKTO 

270 280 290 300 310 320 330 340 



1308 1338 1368 1393 1428 1452 1482 1512 

35 QNKVESILKQYYGSELSSGRaSFSEGMYAVAIEPSTGKVIAMAGLKNDHG--NLVDDSLGTIAKNFTPGSVVKGATLSSG 

I II II I 11 = 1 :|::lllllll = II llllll l-^l == 1 = 1111 I I I I I 1 1 I I I I I = = 1 
QKGVEDILGCMIiSSEISGNKATYSEGMYAVVMNaDTGAVIAMAGQKHEQGaQDFKA^^ 

350 360 370 380 390 400 410 420 

40 1542 1566 1587 1614 1644 1674 1704 1734 

WENKVLRGNEVLYDQ- -EIAN- - - IRSWFT-RGLTPISAAQALEYSSNTYMVQVALRLMGQDYNTGDALTDRGYQEAMAK 



45 



WRSGAIYGDQVLTDQPINIASSPPITSWFTDKGSRAITATQAIiEYSSNTyMVQIAIKRLGQQYVPGMSLSTDNMEKAMTT 
430 440 450 460 470 480 490 500 



1764 1821 1851 1881 1911 1941 1971 

LRKTYGEYGLGVSTGLDLP-ESEGYVPGKYSLGTTimSFGQYDRYTPMQLGQYIS 

II II l = |:|||llllll llllhl l = : I hllllMI =11 ll = = = lll I 1 = 111 = 1 1|: = 
IiRDTY3ffiFGMGVSTGIJ3LPGESEGYIPKiraiVBNVLTEAFGQYDSYTTIQIAQYVASI^ 
50 510 520 530 540 550 560 570 580 



2001 2031 2061 2091 2121 2151 2181 2211 

KFAQLTOSlTPKTtNKIAISDQEIAIIQEGFYNVVNSGSGYATGTSMRGim^TISGKTGTAETFAKISrVNGQWSTYI^^ 

= I == : 111=== ==1 iii = ii==iiiiii III =1 == iiiiiiiiihi = =1 =1 = 1 nil 

55 SLGTLSSTVDTRViam^SIJDSKQIiGIIQQGFHDVVNSGSSIATGKaMASSIIPISGKTGTAETO^ 

590 600 610 620 630 640 650 660 



2252 2292 2322 2352 2382 2412 2442 

lAYDTNR- - -KIAVAVMYPHVTTDTTKSHQLVARDMIDQYISQFTGQ*ERTPECFTQHQIiLN*LTAFQimV*VLKQ^ 
60 :|| 1 : 1:|| :|1|| :1:|| = ::: | = 

VAYATAKDGTKLAVGI^«PHaLDWKSKAHQNAVKAIMELYQNTH 
670 680 690 700 

SEQ ID 8998 (GBS292) was expressed in E.coli as a GST-fiasion product. SDS-PAGE analysis of total cell 
65 extract is shown in Figure 68 Oane 9; MW 103kDa). 

GBS292-GST was purified as shown in Figure 211, lane 7. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be usefiol antigens for 
vaccines or diagnostics. 

Example 2213 

A DNA sequence (GBSx2332) was identified in S.agalactiae <SEQ ID 683 1> which encodes the amino 
5 acid sequence <SEQ ID 6832>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytqplasm — Certaintyi=0. 2644 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:CaB51328 GB:AJ131985 phosphoglyceromutase [Streptococcus pneumoniae] 

Identities = 219/230 (95%) , Positives = 226/230 (98%) 

Query: 1 MVKLVFARHGESEWNKANLFTGWADVDLSEKGTQQAIDAGKLIQAAGIEFDLAFTSVLKR 60 
MVKLVFARHGESEWNKANLFTGWADVDLSEKGTQQAIDAGKLI+ AGI+FD A+TSVLKR 
20 Sbjct: 1 MVKLVFAiMGESEWNKAlJLFTGWaDVDLSEKGTCXSAIDAGKLIKEAGIKFDQAYTSVLKR 60 

Query: 61 AIKTTNIALEaADQLWVPVEKSWRLNERHYGGLTGKNKaEAAEQFGDEQVHIWRRSYDVL 120 

AIKTTNLALEA+DQLWVITOKSWRLlffiRHYGGLTGKNKAEAAEQFGDEQVH^ 
Sbjct: 61 AIKTTNLALEASDQLWVPVEKSWRiaJERHYGGLTGKNKAEAAEQFGDEQVHIWRRS^^ 120 

25 

Query: 121 PPDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVG 180 

PP+M +DDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVG 
Sbjct: 121 PPNMDRDDEHSAHTDRRYASI^DSVIPDAEISLKVTLERALPFWEDKIAPALKDGKNVFVG 180 

t 

30 Query: 181 AHGNSIRALVKHIKQLSDDEIMDVEIPNFPPLVFEFDEKUTLVSEYyiiGK 230 

AHGNSIRALVKHIK LSDDEIMDVEIPNFPPLVFEFDEKMI+VSEYYLGK 
Sbjct: 181 AHQNSIRALVKHIKGLSDDEIMDVEIPNFPPLVFEFDEKLNWSEYYLGK 230 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6833> which encodes the amino acid 
35 sequence <SEQ ID 6834>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm — Certainty=0 . 2646 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certaintyi=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

45 Identities = 206/229 (89%), Positives = 214/229 (92%) 

Query: 1 MVKLVFARHGESETOKANLFTGWaDVDLSEKGTQQAIDAGKLIQAAGIEFDLAFTSVLKR 60 

MVKIjVFARHGESEWNKMILFTGWfiDVDLSERGTQQAIDAGKLI+ AGIEFDLAFTSVL R 
Sbjct: 1 MVKLVFARHGESEWNKANLFTGWaDVDLSERGTQQAIDAGKLIKEftGIEFDI^ 60 

Query: 61 AIKTTNIALEflADQLWPVEKSWRLNERHYGGLTGKNKAEAAEQFGDEQVHIWRRSYDVL 120 

AIKTTNIALE A QLWVP EKSWRLNERHYG LTGKNKRE2iAEQF DEQVHIWRRSYDVL 
Sbjct: 61 AIKTTNLRLENAGQLWPTEKSWRLNERHyGALTGKNKAEaAEQFCDEQVHIW^ 120 

55 Query: 121 PPDMAKDDEHSAHTDRRYASLDDSVIPDAENLKVTLERALPFWEDKIAPALKDGKNVFVG 180 

PP MAKDDE+SAH DRRYA LD ++IPDAENLKVTLERA+P+WE+KIAPAL DGKNVFVG 
Sbjct: 121 PPAMAKDDEYSAHKDRRYADLDPALIPDAEIffiKAmiERAMPYWEEKIAPALLDGKWFVG 180 

Query: 181 AHGNSIRALVKHIKQLSDDEIMDVEIPNPPPLVFEPDEKMILVSEYYLG 229 



50 
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AHGNSIRALVKHIK LSDDEIMDVElPNFPPLVFE DEpjST+V EYYLG 
Sbjct: 181 AH{aJSIRMJVKHIKGLSDDEI^mVEIPNFPPLVFEIJ3EKIiNIVKE•m,G 229 

SEQ ID 6832 (GBSllO) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 38 (lane 8; MW 28.9kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 41 (lane 10; MW 53.9kDa). 

The GBSllO-GST fusion product was purified (Figure 204, lane 5) and used to immunise mice. The 
resulting antiserum was used for Western blot (Figui-e 252A), FACS (Figure 252B), and in the in vivo 
passive protection assay (Table III). These tests confirm that the protein is immunoaccessible on GBS 
bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2214 

A DNA sequence (GBSx2333) was identified in S.agalactiae <SEQ ID 6835> which encodes the amino 
acid sequence <SEQ ID 6836>. This protein is predicted to be triosephosphate isomerase (tpiA). Analysis of 
this protein sequence reveals the following: 

Possible site: 54 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.37 Transmembrane 36 - 52 ( 36 - 52) 

Final Results 

bacterial membrane — Certainty=0. 1150 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

76P:AAC43268 OB:U07640 triosephosphate isomerase [Lactococcus 

lactis] 

Identities = 164/252 (65%) , Positives = 202/252 (80%) 



Query: 


1 


MSRKPFIAGNWKMNKNPEEAKAFIEAVASKLPSSELVEAGIAAPALTLSTVLEAAKGSEL 


60 






MSRKP lAGNWKMNK EA+AF+EAV + LPSS+ VE+ I APAL L+ + +GSEL 




Sbjct: 


1 


MSRKPIIAGNWKMNKTLSEAQAFVEAVKNNLPSSDNVESVIGAPALFLAPMAYLRQGSEL 


60 


Query: 


61 


KIAAQNSYFENSGAFTGENSPKVLAEMGTDYVVIGHSERRDYPHETDQDINKKAKAIFAN 


120 






K+AA+NSYFEN+GAFTGENSP + ++G +Y++IGHSEa?R+YFHETD+DINKKAKAIFA 




Sbjct: 


61 


KLAAENSYFENAGAFTGENSPAAIVDLGIEYIIIGHSERREYFHETDEDINKKAKAIFAA 


120 


Query: 


121 


GLTPI I CCGESLETYEAGKAVEFVGAQVSAALAGLSEEQVSSLVIAYEPIWAIGTGKSAT 


180 






G TPI+CCGE+LET+EAGK E+V Q+ A LAGL+ EQVS+LVIAYEPIVJAIGTGK+AT 




Sbjct: 


121 


GATPILCCGETLETFEAGKTAEWVSGQIEAGLAGLTAEQVSNLVIAYEPIWaiGTGKTAT 


180 


Query: 


181 


QDDAQ^mCKAVRDVVAADFGQAVADKVRVQYGGSVKPENVAEYMACPDVDGaLVGGASLE 


240 






+ A C VR V +G+ V++ VR+QYGGSVKPE + MA ++DGALVGGASLE 




Sb j ct : 


181 


NEIADETCGVVRSTVEKLYGKEVSEAVRIQYGGSVKPETIEGLMAKENIDGALVGGASLE 


240 


Query: 


241 


AESFLALLDFVK 252 








A+SFLALL+ K 




Sbjct: 


241 


ADSFLALLEMYK 252 





A related DNA sequence was identified in S.pyogenes <SEQ ID 6837> which encodes the amino acid 
sequence <SEQ ID 6838>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

»> Seems to have no N-terminal signal sequence 
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INTEGRAL Likelihood = -1.81 Transmembrane 36 - 52 ( 36 - 52) 

Final Results 

bacterial membrane Certainty=0. 1723 (Affirmative) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 220/251 (87%) , Positives = 237/251 (93%) 



Query: 


1 


MSRKPFIAGNWKMNKNPEEAKaFIEftVASKLPSSELVEAGIAAPMiTLSTVLEAAKGSEL 


60 






MSRKP IAGNWKMNKNP+EAKAF+EAVASKI1PS++LV+ +AAPA+ L T +EAAK S L 




Sbjct: 


1 


MSRKPIIAGISWKMNKNPQEAKAFVEAVASKLPSTDLVDVAVAAPAVDLVTTIEAAKDSVL 


60 


Query: 


61 


KIAAQNSYFENSGAFTGENSPiarLAEMGTDYWIGHSERRDYFHETDQDINKKAKAIFAN 


120 






K+AAQN YFEN+GAFTGE SPKVLAEMG DYWIGHSERRDYFHETD+DINKKAKAIFAN 




Sbjct: 


61 


KVaAQNCYFENTGAFTGETSPKVLAEMGaDYWIGHSERRDYFHETDEDINKKAKAIPAN 


120 


Query: 


121 


GLTPIICCGESLETyEAGKAVEFVGAQVSAALAGLSEEQVSSLVIAYEPIWAlGTGKSAT 


180 






GLTPI+CCGESLETYEAGKAVEFVGAQVSAALAGLS EQV+SLV+AYEPIWAIGTGKSAT 




Sbjct: 


121 


GLTPIVCCGESLETYEAGKAVEFVGAQVSAALAGLSAEQVASLVLAYEPIWAIGTGKSAT 


180 


Query: 


181 


QDDAQNMCKAVRDVVAADFGQAVADCTRVQYGGSVKPENVAEYMACPDVDGALVGGASLE 


240 






QDDaQNMCKAWDWAADFGQ VADKVRVQYGGSVKPENV +YMACPDVDGALVGGASLE 




Sbjct: 


181 


QDnAQ^lMCKAVIa3VVaRDFGQEWaDKTOVQYGGSVKPENVKDYMACPDVDGA^ 


240 


Query: 


241 


AESFLALLDFV 251 








A+SFIjALLDF+ 




Sbjct: 


241 


ADSFLALLDFL 251 





Based on this analysis, it was predicted that these proteins and their epitopes coiild be useful antigens for 
vaccines or diagnostics. 

Example 2215 

A DNA sequence (GBSx2334) was identified in S.agalactiae <SEQ ID 6839> which encodes the amino 
acid sequence <SEQ ID 6840>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3050 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AaB41198 GB:U75481 elongation factor-Tu [Streptococcus rautans] 
Identities = 44/45 (97%) , Positives = 45/45 (99%) 

Query: 1 MVMPGDNVTIEVELIHPIAVEQGTTPSIREGGRTVGSGIVSEIEA 45 

MVMPGDNVTI+VELIHPIAVEQSTTFSIREGGRTVGSGIVSEIEA 
Sbjct: 117 MVMPGDNVTIDVELIHPIAVEQGTTFSIREGGRTVGSGIVSEIEA 161 

There is also homology to SEQ ID 1022: 

Identities = 44/45 (97%) , Positives = 44/45 (97%) 

Query: 1 MVMPGDNVTIEVELIHPIAVEQGTTFSIREGGRTVGSGIVSEIEA 45 

MVMPGDNVTI VELIHPIAVEQGTTFSIREGGRTVGSGIVSEIEA 
Sbjct: 371 MVMPGDNVTINVELIHPIAVEQGTTFSIREGGRTVGSGIVSEIEA 415 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens foT 
vaccines or diagnostics. 

Example 2216 

A DNA sequence (GBSx2335) was identified in S.agalactiae <SEQ ID 6841> which encodes the amino 
acid sequence <SEQ ID 6842>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.66 Transmerrbrane 81 - 97 { 80 - 97) 
INTEGRAL Likelihood = -2.60 Transmembrane 18 - 34 ( 17 - 34) 



Pinal Results 

bacterial membrane CertaintY=0. 2062 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

• No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefijl antigens for 
vaccines or diagnostics. 

20 Example 2217 

A DNA sequence (GBSx2336) was identified in S.agalactiae <SEQ ID 6843> which encodes the amino 
acid sequence <SEQ ID 6844>. Analysis of this protein sequence reveals the following: 



Possible site: 26 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0596 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S:pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefiil antigens for 
vaccines or diagnostics. 

35 Example 2218 

A DNA sequence (GBSx2337) was identified in S.agalactiae <SEQ ID 6845> which encodes the amino 
acid sequence <SEQ ID 6846>. Analysis of this protein sequence reveals the following: 



Possible site: 14 

»> Seems to have no N-terminal signal sequence 



Pinal Results 

bacterial cytoplasm Certainty=0. 3559 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 



